COD_20230831_HOST_3_analysis_figure_table

Minsik Kim

2023-09-25

Loading packages

#===============================================================================
#BTC.LineZero.Header.1.1.0
#===============================================================================
#R Markdown environment setup and reporting utility.
#===============================================================================
#RLB.Dependencies:
#   knitr, magrittr, pacman, rio, rmarkdown, rmdformats, tibble, yaml
#===============================================================================
#Input for document parameters, libraries, file paths, and options.
#=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=
knitr::opts_chunk$set(message=FALSE, warning = FALSE)



path_working <- 
        ifelse(sessionInfo()[1]$R.version$platform == "x86_64-pc-linux-gnu",
               "/home/bagel/minsik/",
               ifelse(sessionInfo()[1]$R.version$platform == "aarch64-apple-darwin20",
                      "/Volumes/macdrive/Dropbox/", 
                      "/Dropbox (Personal)"))

path_library <- 
        ifelse(sessionInfo()[1]$R.version$platform == "x86_64-pc-linux-gnu",
               "/home/bagel/R/x86_64-pc-linux-gnu-library/4.1/",
               "/Library/Frameworks/R.framework/Resources/library/")

str_libraries <- c("readxl", "phyloseq", "tidyverse", "pacman", "yaml", "ggplot2", "vegan", "microbiome", "ggpubr", "viridis", "decontam", "gridExtra", "ggpubr", "lme4", "lmerTest", "writexl", "harrietr", "Maaslin2", "ggtext", "ggpmisc", "gamm4", "reshape2", "kableExtra", "knitr", "ggtree", "car", "mediation", "lemon", "qvalue")
        
YAML_header <-
'---
title: "Host-DNA depletion analysis"
author: "Minsik Kim"
date: "2032.09.25"
output:
    rmdformats::downcute:
        downcute_theme: "chaos"
        code_folding: hide
        fig_width: 6
        fig_height: 6
---'
seed <- "20230925"

#=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
#Loads libraries, file paths, and other document options.
#=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
FUN.LineZero.Boot <- function() {
    .libPaths(path_library)

    require(pacman)
    pacman::p_load(c("knitr", "rmarkdown", "rmdformats", "yaml"))

    knitr::opts_knit$set(root.dir = path_working)

    str_libraries |> unique() |> sort() -> str_libraries
    pacman::p_load(char = str_libraries)

    set.seed(seed)
}
FUN.LineZero.Boot()
#=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
#Outputs R environment report.
#=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
FUN.LineZero.Report <- function() {
    cat("Line Zero Environment:\n\n")
    paste("R:", pacman::p_version(), "\n") |> cat()
    cat("Libraries:\n")
    for (str_libraries in str_libraries) {
        paste(
            "    ", str_libraries, ": ", pacman::p_version(package = str_libraries),
            "\n", sep = ""
        ) |> cat()
    }
    paste("\nOperating System:", pacman::p_detectOS(), "\n") |> cat()
    paste("    Library Path:", path_library, "\n") |> cat()
    paste("    Working Path:", path_working, "\n") |> cat()
    paste("Seed:", seed, "\n\n") |> cat()
    cat("YAML Header:\n")
    cat(YAML_header)
}
FUN.LineZero.Report()
## Line Zero Environment:
## 
## R: 4.4.1 
## Libraries:
##     readxl: 1.4.3
##     phyloseq: 1.48.0
##     tidyverse: 2.0.0
##     pacman: 0.5.1
##     yaml: 2.3.10
##     ggplot2: 3.5.1
##     vegan: 2.6.8
##     microbiome: 1.26.0
##     ggpubr: 0.6.0
##     viridis: 0.6.5
##     decontam: 1.24.0
##     gridExtra: 2.3
##     ggpubr: 0.6.0
##     lme4: 1.1.35.5
##     lmerTest: 3.1.3
##     writexl: 1.5.1
##     harrietr: 0.2.3
##     Maaslin2: 1.18.0
##     ggtext: 0.1.2
##     ggpmisc: 0.6.0
##     gamm4: 0.2.6
##     reshape2: 1.4.4
##     kableExtra: 1.4.0
##     knitr: 1.48
##     ggtree: 3.12.0
##     car: 3.1.3
##     mediation: 4.5.0
##     lemon: 0.4.9
##     qvalue: 2.36.0
## 
## Operating System: Darwin 
##     Library Path: /Library/Frameworks/R.framework/Resources/library/ 
##     Working Path: /Volumes/macdrive/Dropbox/ 
## Seed: 20230925 
## 
## YAML Header:
## ---
## title: "Host-DNA depletion analysis"
## author: "Minsik Kim"
## date: "2032.09.25"
## output:
##     rmdformats::downcute:
##         downcute_theme: "chaos"
##         code_folding: hide
##         fig_width: 6
##         fig_height: 6
## ---

1. Loading data

1.1. phyloseq obejct

1.2. qPCR data (controls)

LIST OF PRIMARY QUESTIONS AND CORRESPONDING ANALYSES

STUDY AIMS

Aim 1. What is the efficiency of host depletion for each method?

    •   % host DNA measured by mNGS
    o   Sequencing failure rates
    •   % host DNA measured by qPCR 

Aim 2. Did host depletion change microbial community composition?

    •   Alpha diversity (microbial species richness, microbial predicted functional richness)
    •   Beta diversity (Morisita-Horn distance compared to control (not host-depleted) sample)
    •   Differential abundance

Aim 3: Is there effect modification by sample type?

Aim 4. Does host depletion increase the risk of contamination?

Aim 1: What is the efficiency of host depletion for each method?

1a. Did treatment change % host DNA?

Figure

    A: Raw reads (do you really need to log10 transform y-axis?)
    B: Host mapped reads
    C: Final reads (QC’d, non-human)
    D: % Host DNA 

Statistical model

    ### linear mixed effects model to account for repeated measures (multiple aliquots per individual)
    ### Outcome = % host DNA (mNGS reads mapping to human genome/QC’d reads)
    ### Predictors
            Host depletion method (categorical, comparison group = control not host-depleted)
            Sample type
    ### testing for interaction term to justify stratified analysis
    ###  % host DNA ~ method + sample_type + method*sample_type + (1|subjid), report interaction p-value.
    ### Stratified analysis
    ###  %host ~ method + (1|subjid), report beta [95% CI], p-value for each sample type
    

1b. Did host depletion work successfully for sequencing?

Statistical Model

    ### Logistics mixed effects model
            ### Outcome = library prep and sequencing failure rate
            ### (==1 if failed library prep, ==0 if not)
            ### Report in text sequencing failure (n) stratified by sample type and treatment method.
    ## testing for interaction term to justify stratified analysis
            ### fail ~ method + sample_type + method*sample_type + (1|subjid), report interaction p-value.
    ### stratified analysis: 
            ### fail ~ method + (1|subjid), report OR [95% CI], p-value for each sample type

Final reads

    ### Investigate distribution of final reads and apply proper transformation.

Figure

    A: Raw reads
    B: Host mapped reads
    C: Final reads
    D: Host DNA ratio
    

model (to justify stratified analysis)

    ### final reads ~ method + sample_type + method*sample_type + (1|subjid), just report interaction p-value.

model (stratified analysis)

    # final reads ~ method + (1|subjid), report effect size [95% CI], p-value for each sample type.
    

Aim 2. Did host depletion change microbial community composition?

Figure

    # Alpha diversity.  Species richness by treatment, facetted by sample type

    # Beta diversity. Morisita-Horn distance of each treated compared to untreated sample  
    # forest plot - x-axis as distance (size of bias) and y axis as treatment. Facet data by sample type

Alpha diversity

    # Linear mixed effects model
            ## Outcome 
            ### species richness
            ## inverse simpson
    # stratified analyses by sample type
            ## report the significant method*sample_type interaction term to justify stratified analysis)
            ## species_richness ~ method + sample_type + method * sample_type + (1|subjid)
            ## InvSimp~ sample type + treatment + sample type * treatment + (1|subject_id) 

Beta diversity

    ### Outcome = Morisita Horn 
    # PERMANOVA
            ## Overall: MH ~ sample type + treatment + sample type * treatment, strata = subject_id ) 
            ## Stratified: MH ~ method + strata = subject_id
    # Calculate Morisita-Horn distance between control and host depleted sample and use that as an outcome in linear model
            ## Change in MH ~ method + sample_type + method*sample_type + log10(reads))

Differential abundance of species

    # Outcome = relative abundance of species

Figure

    # Volcano plot with Mock, BAL, Nasal and Sputum 
            ## Note: Mock community placed in Zymo DNA/RNA Protect prior to freezing. Zymo DNA/RNA protect is a mild detergent thus increases susceptibility of microbial cells to lysis and is not recommended to put these samples through host depletion 

Statistical model

    # MaAsLin2
            ## Stratified by sample type
                    ### Species ~ + lyPMA + Benzonase + HostZERO + MolYsis + QIAamp + (1|subjid). Comparison group is untreated sample
            ## Figure: Balloon plots for BAL, Nasal and Sputum (Figure). Add q-val, mean relative abundance
    

Proportion gram negative

    # Species collapsed into single category: gram negative bacteria, gram positive bacteria, fungi
    # Statistical model
            ## % gram negative ~ sample type + treatment + sample type * treatment + (1|subject_id) ) 
            ## Stratified analysis
                    ### % gram negative ~ treatment + (1|subject_id) ) 

Does change differ by sample type for results of predicted function as well? Repeat all analyses above analyses but use predicted microbial function (KEGG, CPM) instead

Aim 3: Is there effect modification by sample type?

All above analyses with interaction term and stratified analyses

If effect modification present, which treatment is the best for each treatment?

Make a summary result by treatment, stratified by sample type

    ### Issues from sequencing (library failure, no change in host depletion, etc.)
    ### Changes in alpha diversity (Mean species richness change by each subject)
    ### Changes in beta diversity (statistical test results with significant factor)
    

Secondary analyses

1. Is qPCR an alternative to mNGS for estimating % Host DNA?

Figure

    ### A: Correlation Plot. x-axis with host DNA proportion measured with shotgun metagenomic sequencing vs y-axis as that by qPCR.

Statistics

    ### Correlation coefficient
    ### Bland-Altman statistics

2. Mediation analysis. Host depletion treatment increases species and predicted microbial functional richness due higher effective sequencing depth

    # Mediation R package
            ## outcome = species richness
            ## exposure = each host depletion treatment compared to untreated control
            ## mediator = final reads
            ## mediator-outcome confounders = sample type
            ## exposure-mediator confounders = NA
            ## outcome model = Mixed effects linear regression 
            ## mediator model = Mixed effects linear regression

Aim 4. Does host depletion increase the risk of contamination?

  1. Were there any contaminants in the sequencing result? If species richness were increased after treatment, is it due to increased coverage with higher final reads?

     # Run decontam (Davis 2018. PMID: 30558668)
             ## List potential contaminants with their prevalences in samples and negative controls 
             ## Sensitivity analysis where potential contaminant species identified removed
                     ### species richness  ~ treatment + (1|subjid)). 
     # Decontaminate data by estimating microbial population using mock community data.
             ## Run Tinyvamp  (arXiv: 2204.12733)
             ## Make adjusted relative abundance table by calculating taxon-specific detection efficiencies, relative to a reference taxa (Enterococcus).
             ## Known community for efficiency estimation: negative (no taxa) + positive (mock community

Data inputs

Meta data

  • qPCR - bacteria

  • qPCR - human

  • qPCR host %

  • Raw reads

  • final reads

  • sequencing host %

  • library prep failure status

  • Raw reads

  • subject_id

  • treatment

  • sample_type

  • subject_id

Sequencing result

  • samples

  • controls

Aanalysis preparation

Analysis prep

Loading data

# Loading files -----------------------------------------------------------
#loading tidy phyloseq object
phyloseq_unfiltered <- read_rds("Project_SICAS2_microbiome/4_Data/2_Tidy/Phyloseq/PHY_20230521_MGK_host_tidy.rds")

v_phyloseq <- read_rds("Project_SICAS2_microbiome/4_Data/2_Tidy/Phyloseq/PHY_20240212_MGK_host_marker_magu_Jessica.rds")

sample_data <- sample_data(phyloseq_unfiltered$phyloseq_count)
#Metagenote description was made as below
sample_data %>% 
        data.frame(check.names = F) %>% 
        dplyr::select(c("baylor_id", "baylor_other_id", "sample_type", "treated")) %>%
        mutate(nephele_description =
          case_when(
                  sample_type == "Mock" &
                    treated == 0 ~ "Control (untreated) positive mock community",
                  sample_type == "Mock" &
                    treated == 1 ~ "host DNA depleted positive mock community",
                  sample_type == "Neg." &
                    treated == 0 ~ "Control (untreated) negative reagent only controls",
                  sample_type == "Neg." &
                    treated == 1 ~ "host DNA depleted negative reagent only controls",
                  sample_type == "BAL" &
                    treated == 0 ~ "Control (untreated) BAL",
                  sample_type == "BAL" &
                    treated == 1 ~ "host DNA depleted BAL",
                  sample_type == "Nasal" &
                    treated == 0 ~ "Control (untreated) nasal swabs",
                  sample_type == "Nasal" &
                    treated == 1 ~ "host DNA depleted nasal swabs",
                  sample_type == "Sputum" &
                    treated == 0 ~ "Control (untreated) sputum",
                  sample_type == "Sputum" &
                    treated == 1 ~ "host DNA depleted sputum"
                  )
          ) %>%
        group_by(nephele_description) %>%
        mutate(rownum = row_number(nephele_description)) %>%
        mutate(nephele_description_numbered = 
                        paste(nephele_description,
                              rownum,
                              sep = "_")
                ) %>%
        write.csv("Project_SICAS2_microbiome/5_Scripts/MGK/Host_depletion_git/data/LOG_20230913_MGK_HOST_nephele_samplegroup.csv")

Tinyvamp results

# Loading files -----------------------------------------------------------
#loading tidy phyloseq object


tinyvamp_untreated <- read_rds("Project_SICAS2_microbiome/4_Data/2_Tidy/HOST_tinyvamp_decontaminated_Amy_Willis/20230807_amy/untreated_p_hats_all_v3.RDS") %>%
        t()

tinyvamp_lypma <- read_rds("Project_SICAS2_microbiome/4_Data/2_Tidy/HOST_tinyvamp_decontaminated_Amy_Willis/20230807_amy/lypma_p_hats_all_v3.RDS") %>%
        t()

tinyvamp_benzonase <- read_rds("Project_SICAS2_microbiome/4_Data/2_Tidy/HOST_tinyvamp_decontaminated_Amy_Willis/20230807_amy/benzonase_p_hats_all_v3.RDS") %>%
        t()

tinyvamp_host_zero <- read_rds("Project_SICAS2_microbiome/4_Data/2_Tidy/HOST_tinyvamp_decontaminated_Amy_Willis/20230807_amy/hostzero_p_hats_all_v3.RDS") %>%
        t()

tinyvamp_molysis <- read_rds("Project_SICAS2_microbiome/4_Data/2_Tidy/HOST_tinyvamp_decontaminated_Amy_Willis/20230807_amy/molysis_p_hats_all_v3.RDS") %>% 
        t()

tinyvamp_qiaamp <- read_rds("Project_SICAS2_microbiome/4_Data/2_Tidy/HOST_tinyvamp_decontaminated_Amy_Willis/20230807_amy/qiaamp_p_hats_all_v3.RDS") %>% 
        t()

        
#tinyvamp_benzonase <- read_rds("Project_SICAS2_microbiome/4_Data/2_Tidy/HOST_tinyvamp_decontaminated_Amy_Willis/benzonase_p_hats.RDS")

sample_data_tv <- phyloseq_unfiltered$phyloseq_rel %>%
        subset_samples(sample_type %in% c("BAL", "Nasal", "Sputum")) %>%
        sample_data %>% data.frame() %>%
        mutate(treatment = case_when(treatment == "MolYsis" ~ "Molysis",
                                     treatment == "HostZERO" ~ "Host zero",
                                     .default = treatment)) %>%
        mutate("names"= paste(.$original_sample, .$treatment, sep = "_")) %>%
        remove_rownames() %>% column_to_rownames("names") %>% sample_data()

phyloseq_tv <- merge_phyloseq(
        merge_phyloseq(sample_data_tv,
                       otu_table(tinyvamp_untreated,
                                 taxa_are_rows = T)),
        merge_phyloseq(sample_data_tv,
                       otu_table(tinyvamp_lypma,
                                 taxa_are_rows = T)),
        merge_phyloseq(sample_data_tv,
                       otu_table(tinyvamp_benzonase,
                                 taxa_are_rows = T)),
        merge_phyloseq(sample_data_tv,
                       otu_table(tinyvamp_host_zero,
                                 taxa_are_rows = T)),
        merge_phyloseq(sample_data_tv,
                       otu_table(tinyvamp_molysis,
                                 taxa_are_rows = T)),
        merge_phyloseq(sample_data_tv,
                       otu_table(tinyvamp_qiaamp,
                                 taxa_are_rows = T)),
        tax_table(phyloseq_unfiltered$phyloseq_rel)
) 

sample_data(phyloseq_tv)$treatment <-
        phyloseq_tv %>%
        sample_data() %>%
        data.frame(check.names = F) %>% 
        mutate(treatment =
                       case_when(
                               treatment == 
                                       "Molysis"~ 
                                       "MolYsis",
                               treatment == 
                                       "Host zero" ~
                                       "HostZERO", 
                               .default = treatment)) %>%
        .$treatment %>%
        factor(., levels = c("Untreated", 
                             "lyPMA",
                             "Benzonase",
                             "HostZERO",
                             "MolYsis",
                             "QIAamp"))

Alpha diversity indices

alpha_diversity <- function(data) {
        otu_table <- otu_table(data) #%>% .[, colSums(.) !=0]
        S.obs <- rowSums(t(otu_table) != 0)
        sample_data <- sample_data(data)
        data_evenness <- vegan::diversity(t(otu_table)) / log(vegan::specnumber(t(otu_table))) # calculate evenness index using vegan package
        data_shannon <- vegan::diversity(t(otu_table), index = "shannon") # calculate Shannon index using vegan package
        data_hill <- exp(data_shannon)                           # calculate Hills index
        data_dominance <- microbiome::dominance(otu_table, index = "all", rank = 1, aggregate = TRUE) # dominance (Berger-Parker index), etc.
        data_invsimpson <- vegan::diversity(t(otu_table), index = "invsimpson")                          # calculate Shannon index using vegan package
        alpha_diversity <- cbind(S.obs, data_shannon, data_hill, data_invsimpson, data_evenness,data_dominance) # combine all indices in one data table
        sample_data <- merge(data.frame(sample_data), alpha_diversity, by = 0, all = T) %>% column_to_rownames(var = "Row.names")
}
sample_data(v_phyloseq$viral_rel) <- sample_data(alpha_diversity(v_phyloseq$viral_rel))

sample_data(phyloseq_unfiltered$phyloseq_count)$V.obs <- 
        sample_data(v_phyloseq$viral_rel) %>% 
        data.frame %>% 
        dplyr::select("S.obs") %>% 
#        mutate(S.obs = case_when(is.na(S.obs) ~ 0,
#                                 .default = S.obs)) %>%
        .[sample_names(phyloseq_unfiltered$phyloseq_count),]

sample_data(phyloseq_unfiltered$phyloseq_rel)$V.obs <- 
        sample_data(v_phyloseq$viral_rel) %>% 
        data.frame %>% 
        dplyr::select("S.obs") %>% 
#        mutate(S.obs = case_when(is.na(S.obs) ~ 0,
#                                 .default = S.obs)) %>%
        .[sample_names(phyloseq_unfiltered$phyloseq_count),]

sample_data(phyloseq_unfiltered$phyloseq_path_rpk)$V.obs <- 
        sample_data(v_phyloseq$viral_rel) %>% 
        data.frame %>% 
        dplyr::select("S.obs") %>% 
#        mutate(S.obs = case_when(is.na(S.obs) ~ 0,
#                                 .default = S.obs)) %>%
        .[sample_names(phyloseq_unfiltered$phyloseq_count),]

phyloseq <- phyloseq_unfiltered



alpha_diversity <- function(data) {
        otu_table <- otu_table(data) # %>% .[, colSums(.) !=0]
        S.obs <- rowSums(t(otu_table) != 0)
        sample_data <- sample_data(data)
        data_evenness <- vegan::diversity(t(otu_table)) / log(vegan::specnumber(t(otu_table))) # calculate evenness index using vegan package
        data_shannon <- vegan::diversity(t(otu_table), index = "shannon") # calculate Shannon index using vegan package
        data_hill <- exp(data_shannon)                           # calculate Hills index
        data_dominance <- microbiome::dominance(otu_table, index = "all", rank = 1, aggregate = TRUE) # dominance (Berger-Parker index), etc.
        data_invsimpson <- vegan::diversity(t(otu_table), index = "invsimpson")                          # calculate Shannon index using vegan package
        alpha_diversity <- cbind(S.obs, data_shannon, data_hill, data_invsimpson, data_evenness,data_dominance) # combine all indices in one data table
        sample_data <- merge(data.frame(sample_data), alpha_diversity, by = 0, all = T) %>% column_to_rownames(var = "Row.names")
}



sample_data(phyloseq_unfiltered$phyloseq_rel) <- sample_data(alpha_diversity(phyloseq_unfiltered$phyloseq_rel))
sample_data(phyloseq_unfiltered$phyloseq_count) <- sample_data(alpha_diversity(phyloseq_unfiltered$phyloseq_count)) 
sample_data(phyloseq_unfiltered$phyloseq_path_rpk) <- sample_data(alpha_diversity(phyloseq_unfiltered$phyloseq_path_rpk))  

3.1. Screening of treatment effect

i. Did treatment change host % in qPCR results?

qPCR and seqeuncing

Fig. S1. qPCR figure

Figure S1. Host depletion effects measured by qPCR. (A) total DNA (16S bacterial DNA + human), (B) host DNA, (C) bacterial DNA, and (D) proportion of host DNA.

#1A: Change in total DNA (qPCR)

fS1a <- ggplot(sample_data %>% subset(., .$sample_type %in% c("BAL", "Nasal", "Sputum")), aes(x = sample_type, y = log10(DNA_host_nondil + DNA_bac_nondil))) +
        geom_jitter(aes(color = treatment, x = treatment), lwd = 0.2,
                    alpha = 0.3, stroke = 0) +
        stat_summary(aes(color = treatment, x = treatment),
                             fun.data="mean_sdl",  fun.args = list(mult=1), 
                             geom = "pointrange",  size = 0.4) +
        theme_classic (base_size = 12, base_family = "sans")+ 
        scale_color_manual(values = c("#e31a1c", "#fb9a99", "#33a02c", "#b2df8a", "#1f78b4", "#a6cee3"), name = "Treatment", labels = c("Control","lyPMA", "Benzonase", "HostZERO", "MolYsis", "QIAamp")) + #color using https://colorbrewer2.org/#type=qualitative&scheme=Set1&n=6
        scale_x_discrete(name ="Treatment")+
        theme(axis.title.y = element_markdown(),
              plot.tag = element_text(size = 15),
              axis.text.x = element_blank()) +
        facet_wrap(~sample_type, scale = "free_x") +
        ylab("log<sub>10</sub>(qPCR total DNA)<br>(ng/μL)") +
        labs(tag = "A") +
        guides(fill = guide_legend(nrow = 1))



#1B: Change in human DNA (qPCR)
fS1b <- ggplot(sample_data %>% subset(., .$sample_type %in% c("BAL", "Nasal", "Sputum")), aes(x = sample_type, y = log10(DNA_host_nondil))) +
        geom_jitter(aes(col = treatment, x = treatment), 
                    lwd = 0.2, alpha = 0.3, stroke = 0) +
        stat_summary(aes(color = treatment, x = treatment),
                             fun.data="mean_sdl",  fun.args = list(mult=1), 
                             geom = "pointrange",  size = 0.4) +
        theme_classic (base_size = 12, base_family = "sans")+ 
        scale_color_manual(values = c("#e31a1c", "#fb9a99", "#33a02c", "#b2df8a", "#1f78b4", "#a6cee3"), name = "Treatment", labels = c("Control","lyPMA", "Benzonase", "HostZERO", "MolYsis", "QIAamp")) + #color using https://colorbrewer2.org/#type=qualitative&scheme=Set1&n=6
        scale_x_discrete(name ="Treatment")+
        theme(axis.title.y = element_markdown(),
              plot.tag = element_text(size = 15),
              axis.text.x = element_blank()) +
        facet_wrap(~sample_type, scale = "free_x") +
        ylab("log<sub>10</sub>(qPCR host DNA)<br>(ng/μL)") +
        labs(tag = "B")

#1C: Change in 16S DNA (qPCR)
fS1c <- ggplot(sample_data %>% subset(., .$sample_type %in% c("BAL", "Nasal", "Sputum")), aes(x = sample_type, y = log10(DNA_bac_nondil))) +
        geom_jitter(aes(col = treatment, x = treatment),
                    lwd = 0.2, alpha = 0.3, stroke = 0) +
        stat_summary(aes(color = treatment, x = treatment),
                             fun.data="mean_sdl",  fun.args = list(mult=1), 
                             geom = "pointrange",  size = 0.4) +
        theme_classic (base_size = 12, base_family = "sans")+ 
        scale_color_manual(values = c("#e31a1c", "#fb9a99", "#33a02c", "#b2df8a", "#1f78b4", "#a6cee3"), name = "Treatment", labels = c("Control","lyPMA", "Benzonase", "HostZERO", "MolYsis", "QIAamp")) + #color using https://colorbrewer2.org/#type=qualitative&scheme=Set1&n=6
        scale_x_discrete(name ="Treatment")+
        theme(axis.title.y = element_markdown(),
              plot.tag = element_text(size = 15),
              axis.text.x = element_blank()) +
        facet_wrap(~sample_type, scale = "free_x") +
        ylab("log<sub>10</sub>(qPCR bacterial DNA)<br>(ng/μL)") +
        labs(tag = "C")
        
#1D. Change in % host (qPCR)
fS1d <- ggplot(sample_data %>% subset(., .$sample_type %in% c("BAL", "Nasal", "Sputum")), aes(x = sample_type, y = host_proportion)) +
        geom_jitter(aes(col = treatment, x = treatment),
                    lwd = 0.2, alpha = 0.3, stroke = 0) +
        stat_summary(aes(color = treatment, x = treatment),
                             fun.data="mean_sdl",  fun.args = list(mult=1), 
                             geom = "pointrange",  size = 0.4) +
        theme_classic (base_size = 12, base_family = "sans")+ 
        scale_color_manual(values = c("#e31a1c", "#fb9a99", "#33a02c", "#b2df8a", "#1f78b4", "#a6cee3"), name = "Treatment", labels = c("Control","lyPMA", "Benzonase", "HostZERO", "MolYsis", "QIAamp")) + #color using https://colorbrewer2.org/#type=qualitative&scheme=Set1&n=6
        scale_x_discrete(name ="Treatment")+
        theme(axis.title.y = element_markdown(),
              plot.tag = element_text(size = 15),
              axis.text.x = element_blank()) +
        facet_wrap(~sample_type, scale = "free_x") +
        ylab("Host DNA ratio") +
        labs(tag = "D")


#output for markdown
figureS1 <- ggarrange(fS1a, fS1b, fS1c, fS1d, common.legend = T , align = "hv")

figureS1 

png(file = "Project_SICAS2_microbiome/7_Manuscripts/2022_MGK_Host_Depletion/Figures/FigureS1.png",   # The directory you want to save the file in
    width = 180, # The width of the plot in inches
    height = 170, # The height of the plot in inches
    units = "mm",
    res = 600
) #fixing multiple page issue

figureS1
# alpha diversity plots
#ggarrange(f4ad, ggarrange(f4e, f4f, ncol = 2),
#          ncol = 1) # alpha diversity plots

dev.off()
## quartz_off_screen 
##                 2

Fig. S2. numbers

Peggy’s comment on 20240912:

I think you need to distinguish between the host depletion negative (stratified by each method ie lyPMA, benzonase, etc) and sequencing negative controls.

revise Table C12 in your R2R so that besides the row for negative control, also stratify by the type of negative control

create a stacked barplot where you facet by type of negative control

Table updated- negative controls stratified by sample type

table1_mock_and_neg <- sample_data(phyloseq$phyloseq_count) %>% data.frame() %>% 
        #dplyr::filter(sample_type %in% c("Sputum", "Nasal", "BAL")) %>% 
        # mutate(S.obs = case_when(S.obs==0 ~ NA,
        #                          .default = S.obs),
        #        F.obs = case_when(F.obs==0 ~ NA,
        #                          .default = F.obs),
        #        V.obs = case_when(V.obs==0 ~ NA,
        #                          .default = V.obs)) %>% 
        mutate(sample_type = case_when(sample_type == "Neg." & treatment == "Untreated" ~ "Neg. (extraction)",
                                       sample_type == "Neg." & treatment == "lyPMA" ~ "Neg. (lyPMA)",
                                       sample_type == "Neg." & treatment == "Benzonase" ~ "Neg. (Benzonase)",
                                       sample_type == "Neg." & treatment == "HostZERO" ~ "Neg. (HostZERO)",
                                       sample_type == "Neg." & treatment == "MolYsis" ~ "Neg. (MolYsis)",
                                       sample_type == "Neg." & treatment == "QIAamp" ~ "Neg. (QIAamp)",
                                       .default = sample_type),
               sample_type = factor(sample_type, 
                                    levels = c("Neg. (extraction)",
                                               "Neg. (lyPMA)",
                                               "Neg. (Benzonase)",
                                               "Neg. (HostZERO)",
                                               "Neg. (MolYsis)",
                                               "Neg. (QIAamp)",
                                               "Mock",
                                               "BAL",
                                               "Nasal",
                                               "Sputum"))) %>%
        group_by (sample_type) %>%
        summarise(`N` = n(),
            #      `Total DNA <br>ng/µL` = paste(format(round(median(picogreen_ng_ul),2), nsmall = 2, big.mark = ","), "<br>(", format(round(quantile(picogreen_ng_ul, 0.25),2), nsmall = 2, big.mark = ","), ", ", format(round(quantile(picogreen_ng_ul, 0.75),2), nsmall = 2, big.mark = ","), ")", sep = ""),
               `Human DNA <br>pg/µL` = paste(format(round(median(DNA_host_ng_uL*1000),1), nsmall = 1, big.mark = ","), " (", format(round(quantile(DNA_host_ng_uL*1000, 0.25),1), nsmall = 1, big.mark = ","), ", ", format(round(quantile(DNA_host_ng_uL*1000, 0.75),1), nsmall = 1, big.mark = ","), ")", sep = ""),
               `Bacterial DNA <br>pg/µL` = paste(format(round(median(DNA_bac_ng_uL*1000),1), nsmall = 1, big.mark = ","), " (", format(round(quantile(DNA_bac_ng_uL*1000, 0.25),1), nsmall = 1, big.mark = ","), ", ", format(round(quantile(DNA_bac_ng_uL*1000, 0.75),1), nsmall = 1, big.mark = ","), ")", sep = ""),
              `QC'd reads<br>reads x 10<sup>6</sup>` = paste(format(round(median(Reads_after_trim/1000000),1), nsmall = 1, big.mark = ","), " (", format(round(quantile(Reads_after_trim/1000000, 0.25),1), nsmall = 1, big.mark = ","), ", ", format(round(quantile(Reads_after_trim/1000000, 0.75),1), nsmall = 1, big.mark = ","), ")", sep = ""),
              `Host reads<br>%` = paste(format(round(median(sequencing_host_prop*100),1),
                                                nsmall = 1, big.mark = ","),
                                         " (",
                                         format(round(quantile(sequencing_host_prop * 100,
                                                               0.25),
                                                      1),
                                                nsmall = 1,
                                                big.mark = ","), 
                                         ", ", 
                                         format(round(quantile(sequencing_host_prop * 100, 0.75),1), 
                                                nsmall = 1, 
                                                big.mark = ","), 
                                         ")", 
                                         sep = ""),
             `Final reads<br>reads x 10<sup>6</sup>` = paste(format(round(median(Final_reads/1000000,
                                                                                 na.rm = T),
                                                                          1),
                                                                    nsmall = 1, big.mark = ","),
                                                             " (", 
                                                             format(round(quantile(Final_reads/1000000,
                                                                                            0.25,
                                                                                            na.rm = T),
                                                                                   1),
                                                                             nsmall = 1,
                                                                             big.mark = ","),
                                                             ", ",
                                                             format(round(quantile(Final_reads/1000000, 
                                                                                   0.75, 
                                                                                   na.rm = T),
                                                                          1), 
                                                                    nsmall = 1, 
                                                                    big.mark = ","), 
                                                             ")",
                                                             sep = "")
              
            ) %>% 
        data.frame(check.names = F) %>% 
        arrange(sample_type) %>%
        rename(`Sample` = sample_type) %>%
        mutate_all(linebreak) %>% kbl(format = "html", escape = F) %>% kable_styling(full_width = 0, html_font = "sans")

table1_mock_and_neg
Sample N Human DNA
pg/µL
Bacterial DNA
pg/µL
QC’d reads
reads x 106
Host reads
%
Final reads
reads x 106
Neg. (extraction) 6 0.0 (0.0, 0.0) 0.0 (0.0, 0.0) 0.9 (0.7, 1.5) 16.8 (8.0, 20.1) 0.8 (0.6, 1.1)
Neg. (lyPMA) 5 0.0 (0.0, 0.0) 0.0 (0.0, 0.0) 0.9 (0.9, 1.0) 16.5 (14.9, 16.7) 0.7 (0.7, 0.8)
Neg. (Benzonase) 5 0.0 (0.0, 0.0) 0.0 (0.0, 0.0) 1.2 (1.1, 1.5) 17.5 (15.5, 27.4) 0.9 (0.8, 1.0)
Neg. (HostZERO) 5 0.0 (0.0, 0.0) 0.0 (0.0, 0.0) 1.5 (1.5, 3.2) 13.1 (10.2, 28.2) 1.3 (1.0, 1.8)
Neg. (MolYsis) 5 0.0 (0.0, 0.0) 0.0 (0.0, 0.0) 1.2 (1.2, 1.6) 19.9 (16.6, 20.0) 1.0 (0.9, 1.3)
Neg. (QIAamp) 5 0.0 (0.0, 0.0) 0.0 (0.0, 0.0) 1.3 (1.3, 1.7) 17.6 (13.3, 20.1) 1.2 (1.0, 1.4)
Mock 31 0.0 (0.0, 0.0) 1,456.4 (944.6, 2,876.7) 99.6 (74.4, 108.1) 0.6 (0.6, 0.7) 99.6 (73.7, 107.4)
BAL 30 53.7 (13.5, 936.7) 2.0 (0.3, 10.2) 81.2 (36.1, 136.3) 98.1 (92.3, 99.3) 1.4 (0.5, 2.8)
Nasal 35 7.1 (0.7, 121.0) 10.2 (1.6, 24.1) 47.1 (10.9, 79.0) 78.4 (24.0, 92.9) 7.4 (1.0, 24.8)
Sputum 30 218.6 (51.5, 7,583.2) 39.0 (26.5, 121.5) 90.3 (68.3, 106.1) 89.1 (62.1, 95.9) 7.0 (2.6, 38.3)
save_kable(table1_mock_and_neg, file = "Project_SICAS2_microbiome/7_Manuscripts/2022_MGK_Host_Depletion/Figures/table1_mock_neg.html", self_contained = T)

Fig. S2. sequencing results

Fig. S2. (A) final reads after removing low quality reads and host-mapped reads and (B) sum of MetaPhlAn mapped reads by sample type.

sample_data %>% data.frame %>% subset(., .$sample_type == "Neg.")
#how were the samples failed in library prep?
sample_data %>%
        data.frame() %>%
        mutate(sample_type = factor(.$sample_type, 
                                    labels = c("Negative control", "Mock", "BAL", "Nasal", "Sputum")))
figS2_raw_reads <- sample_data %>% data.frame %>% mutate(total_read = phyloseq_unfiltered$phyloseq_count %>% otu_table %>% colSums()) %>%
        mutate(sample_type = factor(.$sample_type, 
                                    labels = c("Negative control", "Mock", "BAL", "Nasal", "Sputum"))) %>%
        ggplot(aes(x = reorder(baylor_other_id, -Raw_reads),
                               y = Raw_reads/1000000,
                               col = sample_type)) +
                geom_point() +
                theme_classic(base_family = "sans") +
                theme(axis.title.y = element_markdown(), axis.text.x = element_blank()) +
                ylab("Raw reads X 10<sup>6</sup>") +
                xlab("Samples") +
        guides(col=guide_legend(title="Sample type")) +
        scale_color_brewer(type = "qual", palette = 6) +
        labs(tag = "A") #+
        #ylim(c(0, 8.5))

figS2_raw_reads_log <- sample_data %>% data.frame %>% mutate(total_read = phyloseq_unfiltered$phyloseq_count %>% otu_table %>% colSums()) %>%
        mutate(sample_type = factor(.$sample_type, 
                                    labels = c("Negative control", "Mock", "BAL", "Nasal", "Sputum"))) %>%
        ggplot(aes(x = reorder(baylor_other_id, -Raw_reads),
                               y = log10(Raw_reads+1),
                               col = sample_type)) +
                geom_point() +
                theme_classic(base_family = "sans") +
                theme(axis.title.y = element_markdown(), axis.text.x = element_blank()) +
                ylab("log<sub>10</sub>(raw reads)") +
                xlab("Samples") +
        guides(col=guide_legend(title="Sample type")) +
        scale_color_brewer(type = "qual", palette = 6) +
        labs(tag = "A") #+
        #ylim(c(0, 8.5))

figS2_final_reads <- sample_data %>% data.frame %>% mutate(total_read = phyloseq_unfiltered$phyloseq_count %>% otu_table %>% colSums()) %>%
        mutate(sample_type = factor(.$sample_type, 
                                    labels = c("Negative control", "Mock", "BAL", "Nasal", "Sputum"))) %>%
        ggplot(aes(x = reorder(baylor_other_id, -Final_reads),
                               y = Final_reads/1000000,
                               col = sample_type)) +
                geom_point() +
                theme_classic(base_family = "sans") +
                theme(axis.title.y = element_markdown(), axis.text.x = element_blank()) +
                ylab("Final reads X 10<sup>6</sup>") +
                xlab("Samples") +
        guides(col=guide_legend(title="Sample type")) +
        scale_color_brewer(type = "qual", palette = 6) +
        labs(tag = "B") #+
        #ylim(c(0, 8.5))


figS2_final_reads_log <- sample_data %>% data.frame %>% mutate(total_read = phyloseq_unfiltered$phyloseq_count %>% otu_table %>% colSums()) %>%
        mutate(sample_type = factor(.$sample_type, 
                                    labels = c("Negative control", "Mock", "BAL", "Nasal", "Sputum"))) %>%
        ggplot(aes(x = reorder(baylor_other_id, -Final_reads),
                               y = log10(Final_reads + 1),
                               col = sample_type)) +
                geom_point() +
                theme_classic(base_family = "sans") +
                theme(axis.title.y = element_markdown(), axis.text.x = element_blank()) +
                ylab("log<sub>10</sub>(final reads)") +
                xlab("Samples") +
        guides(col=guide_legend(title="Sample type")) +
        scale_color_brewer(type = "qual", palette = 6) +
        labs(tag = "B")


figS2_total_reads <- sample_data %>% data.frame %>% mutate(total_read = phyloseq_unfiltered$phyloseq_count %>% otu_table %>% colSums()) %>%
        mutate(sample_type = factor(.$sample_type, 
                                    labels = c("Negative control", "Mock", "BAL", "Nasal", "Sputum"))) %>%
        ggplot(aes(x = reorder(baylor_other_id, -total_read),
                               y = total_read/1000000,
                               col = sample_type)) +
                geom_point() +
                theme_classic(base_family = "sans") +
                theme(axis.title.y = element_markdown(), axis.text.x = element_blank()) +
                ylab("Microbial reads X 10<sup>6</sup>") +
                xlab("Samples") +
        guides(col=guide_legend(title="Sample type")) +
        scale_color_brewer(type = "qual", palette = 6) +
        labs(tag = "C")


figS2_total_reads_log <- sample_data %>% data.frame %>% mutate(total_read = phyloseq_unfiltered$phyloseq_count %>% otu_table %>% colSums()) %>%
        mutate(sample_type = factor(.$sample_type, 
                                    labels = c("Negative control", "Mock", "BAL", "Nasal", "Sputum"))) %>%
        ggplot(aes(x = reorder(baylor_other_id, -total_read),
                               y = log10(total_read + 1),
                               col = sample_type)) +
                geom_point() +
                theme_classic(base_family = "sans") +
                theme(axis.title.y = element_markdown(), axis.text.x = element_blank()) +
                ylab("log<sub>10</sub>(Microbial reads)") +
                xlab("Samples") +
        guides(col=guide_legend(title="Sample type")) +
        scale_color_brewer(type = "qual", palette = 6) +
        labs(tag = "C") 


figS2 <- ggarrange(figS2_raw_reads, figS2_final_reads, figS2_total_reads, ncol = 1, nrow = 3, common.legend = T)


figS2_log <- ggarrange(figS2_raw_reads_log, figS2_final_reads_log, figS2_total_reads_log, ncol = 1, nrow = 3, common.legend = T)

figS2

figS2_log

png(file = "Project_SICAS2_microbiome/7_Manuscripts/2022_MGK_Host_Depletion/Figures/FigureS2_revised.png",   # The directory you want to save the file in
    width = 180, # The width of the plot in inches
    height = 210, # The height of the plot in inches
    units = "mm",
    res = 600
) #fixing multiple page issue

figS2
# alpha diversity plots
#ggarrange(f4ad, ggarrange(f4e, f4f, ncol = 2),
#          ncol = 1) # alpha diversity plots

dev.off()
## quartz_off_screen 
##                 2
png(file = "Project_SICAS2_microbiome/7_Manuscripts/2022_MGK_Host_Depletion/Figures/FigureS2_revised_log.png",   # The directory you want to save the file in
    width = 180, # The width of the plot in inches
    height = 210, # The height of the plot in inches
    units = "mm",
    res = 600
) #fixing multiple page issue

figS2_log
# alpha diversity plots
#ggarrange(f4ad, ggarrange(f4e, f4f, ncol = 2),
#          ncol = 1) # alpha diversity plots

dev.off()
## quartz_off_screen 
##                 2

ii. How were changes in sequencing results?

Changes in sequencing output

Table 1 is generated later (after QCing to remove sequencing failed samples, etc.)

Library failure

Failure crosstab

sample_data <- sample_data(phyloseq_unfiltered$phyloseq_count) %>% 
          data.frame() %>%
          mutate(
              sample_sums = sample_sums(phyloseq_unfiltered$phyloseq_count),
              lib_failed = case_match(lib_failed, TRUE ~ 1, FALSE ~0),
              seq_failed = ifelse(sample_sums == 0, 1, 0),
              lib_seq_failed = ifelse(lib_failed ==1 | seq_failed == 1, 1, 0),
              prop_host = round((Host_mapped/Reads_after_trim), 1)
              )
        xtabs(lib_failed ~ sample_type + treatment, data = sample_data)
##            treatment
## sample_type Untreated lyPMA Benzonase HostZERO MolYsis QIAamp
##      Neg.           0     0         0        0       0      0
##      Mock           0     0         0        0       0      0
##      BAL            0     1         0        1       1      0
##      Nasal          0     4         0        2       4      0
##      Sputum         0     0         0        0       0      0
        xtabs(seq_failed ~ sample_type + treatment, data = sample_data)
##            treatment
## sample_type Untreated lyPMA Benzonase HostZERO MolYsis QIAamp
##      Neg.           0     0         0        0       0      0
##      Mock           0     0         0        0       0      0
##      BAL            1     1         0        0       0      0
##      Nasal          0     0         0        0       0      0
##      Sputum         0     0         0        0       0      0
        xtabs(lib_seq_failed ~ sample_type + treatment, data = sample_data)
##            treatment
## sample_type Untreated lyPMA Benzonase HostZERO MolYsis QIAamp
##      Neg.           0     0         0        0       0      0
##      Mock           0     0         0        0       0      0
##      BAL            1     2         0        1       1      0
##      Nasal          0     4         0        2       4      0
##      Sputum         0     0         0        0       0      0

Library failure - OR (all samples)

Stratified OR cannot be calculated, as some samples showed 0 library failure.

Effect size, standard error (SE) and t-value at a statistical test on library prep failure rate using generalized linear mixed effect model. glm ( sequencing fail ~ sample_type + treatment + sample_type * treatment + (1|subject_id) )

gm1 <- glmer(lib_failed ~ sample_type + treatment + sample_type * treatment + (1|subject_id),
      data = sample_data %>% 
              subset(sample_type %in% c("BAL", "Nasal", "Sputum")) %>% 
              data.frame(check.names = F),
      family = binomial)

gm1 %>% summary %>% .$coefficients
##                                         Estimate  Std. Error       z value
## (Intercept)                          -21.7185928    20795.51 -1.044388e-03
## sample_typeNasal                       0.1552804    25464.53  6.097908e-06
## sample_typeSputum                     -8.4802015  1614863.76 -5.251342e-06
## treatmentlyPMA                        20.0406740    20795.51  9.637019e-04
## treatmentBenzonase                    -3.8606553   144818.73 -2.665854e-05
## treatmentHostZERO                     20.0406709    20795.51  9.637017e-04
## treatmentMolYsis                      20.0406701    20795.51  9.637017e-04
## treatmentQIAamp                       -5.4238408   313832.06 -1.728262e-05
## sample_typeNasal:treatmentlyPMA        3.2667884    25464.53  1.282878e-04
## sample_typeSputum:treatmentlyPMA     -22.3386079  5344185.65 -4.179984e-06
## sample_typeNasal:treatmentBenzonase    0.8974271   168154.90  5.336907e-06
## sample_typeSputum:treatmentBenzonase  -1.4086264 22564912.93 -6.242552e-08
## sample_typeNasal:treatmentHostZERO     0.8448602    25464.53  3.317792e-05
## sample_typeSputum:treatmentHostZERO  -20.3882571  2509755.81 -8.123602e-06
## sample_typeNasal:treatmentMolYsis      3.1358205    25464.53  1.231446e-04
## sample_typeSputum:treatmentMolYsis   -19.6253680  2080626.07 -9.432434e-06
## sample_typeNasal:treatmentQIAamp      -3.2248662  1761497.88 -1.830752e-06
## sample_typeSputum:treatmentQIAamp     -0.4359967 30057041.91 -1.450564e-08
##                                       Pr(>|z|)
## (Intercept)                          0.9991667
## sample_typeNasal                     0.9999951
## sample_typeSputum                    0.9999958
## treatmentlyPMA                       0.9992311
## treatmentBenzonase                   0.9999787
## treatmentHostZERO                    0.9992311
## treatmentMolYsis                     0.9992311
## treatmentQIAamp                      0.9999862
## sample_typeNasal:treatmentlyPMA      0.9998976
## sample_typeSputum:treatmentlyPMA     0.9999967
## sample_typeNasal:treatmentBenzonase  0.9999957
## sample_typeSputum:treatmentBenzonase 1.0000000
## sample_typeNasal:treatmentHostZERO   0.9999735
## sample_typeSputum:treatmentHostZERO  0.9999935
## sample_typeNasal:treatmentMolYsis    0.9999017
## sample_typeSputum:treatmentMolYsis   0.9999925
## sample_typeNasal:treatmentQIAamp     0.9999985
## sample_typeSputum:treatmentQIAamp    1.0000000

None of associations were significant at binomial generalized mixed effect model.

With glmer(lib_failed ~ sample_type + (1|subject_id)),

glmer(lib_failed ~ sample_type + (1|subject_id),
      data = sample_data %>% 
              subset(sample_type %in% c("BAL", "Nasal", "Sputum")) %>% 
              data.frame(check.names = F),
      family = binomial) %>%
        summary
## Generalized linear mixed model fit by maximum likelihood (Laplace
##   Approximation) [glmerMod]
##  Family: binomial  ( logit )
## Formula: lib_failed ~ sample_type + (1 | subject_id)
##    Data: 
## sample_data %>% subset(sample_type %in% c("BAL", "Nasal", "Sputum")) %>%  
##     data.frame(check.names = F)
## 
##      AIC      BIC   logLik deviance df.resid 
##     69.4     79.6    -30.7     61.4       91 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -0.6325 -0.6325 -0.3333  0.0000  3.0000 
## 
## Random effects:
##  Groups     Name        Variance Std.Dev.
##  subject_id (Intercept) 0        0       
## Number of obs: 95, groups:  subject_id, 20
## 
## Fixed effects:
##                     Estimate Std. Error z value Pr(>|z|)    
## (Intercept)       -2.197e+00  6.086e-01  -3.610 0.000306 ***
## sample_typeNasal   1.281e+00  7.144e-01   1.793 0.072970 .  
## sample_typeSputum -3.377e+01  1.179e+07   0.000 0.999998    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr) smpl_N
## smpl_typNsl -0.852       
## smpl_typSpt  0.000  0.000
## optimizer (Nelder_Mead) convergence code: 0 (OK)
## boundary (singular) fit: see help('isSingular')

It seems like there were sample_type specific effects (since some of the sample * treatment did not shoy any failure).

Library failure (stratified)

glm ( sequencing fail ~ treatment + subject_id )

–> Cannot run for Sputum (no failed sample).

For BAL

GLMER result

glmer_libfail_bal <- glmer(lib_failed ~ treatment + (1|subject_id), data = sample_data %>% data.frame %>% dplyr::filter(sample_type %in% c("BAL")), family = "binomial")
glmer_libfail_bal %>% summary
## Generalized linear mixed model fit by maximum likelihood (Laplace
##   Approximation) [glmerMod]
##  Family: binomial  ( logit )
## Formula: lib_failed ~ treatment + (1 | subject_id)
##    Data: sample_data %>% data.frame %>% dplyr::filter(sample_type %in%  
##     c("BAL"))
## 
##      AIC      BIC   logLik deviance df.resid 
##     28.7     38.5     -7.3     14.7       23 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -0.7529 -0.3404  0.0000  0.0000  1.9092 
## 
## Random effects:
##  Groups     Name        Variance Std.Dev.
##  subject_id (Intercept) 1.294    1.138   
## Number of obs: 30, groups:  subject_id, 5
## 
## Fixed effects:
##                    Estimate Std. Error z value Pr(>|z|)
## (Intercept)          -26.24    1430.39  -0.018    0.985
## treatmentlyPMA        24.49    1430.39   0.017    0.986
## treatmentBenzonase   -34.45    3045.52  -0.011    0.991
## treatmentHostZERO     24.49    1430.39   0.017    0.986
## treatmentMolYsis      24.49    1430.39   0.017    0.986
## treatmentQIAamp     -156.35    3045.52  -0.051    0.959
## 
## Correlation of Fixed Effects:
##             (Intr) trtPMA trtmnB tHZERO trtmMY
## trtmntlyPMA -1.000                            
## trtmntBnzns  0.188 -0.188                     
## trtmntHZERO -1.000  1.000 -0.188              
## trtmntMlYss -1.000  1.000 -0.188  1.000       
## trtmntQIAmp  0.188 -0.188 -0.206 -0.188 -0.188
## optimizer (Nelder_Mead) convergence code: 0 (OK)
##  Hessian is numerically singular: parameters are not uniquely determined

No treatment significantly affected for library failure for BAL.

For nasal, GLMER result does not converge. As all the variables are categorical, we cannot rescale.

Host ratio

Host ratio (all samples)

None-stratified. p-value of interaction term was 4.587e-16

lmer_host_ratio_all <- lmer(sequencing_host_prop * 100 ~ sample_type * treatment + (1|subject_id), 
                    data = sample_data %>%
                            data.frame %>%
                            mutate(total_read = phyloseq_unfiltered$phyloseq_count %>%
                            otu_table %>%
                            colSums()) %>%
                            subset(., total_read != 0))
        
lmer_host_ratio_all
## Linear mixed model fit by REML ['lmerModLmerTest']
## Formula: 
## sequencing_host_prop * 100 ~ sample_type * treatment + (1 | subject_id)
##    Data: 
## sample_data %>% data.frame %>% mutate(total_read = phyloseq_unfiltered$phyloseq_count %>%  
##     otu_table %>% colSums()) %>% subset(., total_read != 0)
## REML criterion at convergence: 1052.31
## Random effects:
##  Groups     Name        Std.Dev.
##  subject_id (Intercept) 11.44   
##  Residual               12.04   
## Number of obs: 155, groups:  subject_id, 22
## Fixed Effects:
##                          (Intercept)                       sample_typeMock  
##                              15.8049                              -15.5516  
##                       sample_typeBAL                      sample_typeNasal  
##                              85.3037                               79.4212  
##                    sample_typeSputum                        treatmentlyPMA  
##                              83.1787                               -1.2538  
##                   treatmentBenzonase                     treatmentHostZERO  
##                               8.2009                                5.1396  
##                     treatmentMolYsis                       treatmentQIAamp  
##                               3.4039                                1.2271  
##       sample_typeMock:treatmentlyPMA         sample_typeBAL:treatmentlyPMA  
##                               6.6544                               -2.6414  
##      sample_typeNasal:treatmentlyPMA      sample_typeSputum:treatmentlyPMA  
##                             -26.5524                               -2.5511  
##   sample_typeMock:treatmentBenzonase     sample_typeBAL:treatmentBenzonase  
##                              -7.8214                              -10.8156  
##  sample_typeNasal:treatmentBenzonase  sample_typeSputum:treatmentBenzonase  
##                             -28.4482                              -14.4549  
##    sample_typeMock:treatmentHostZERO      sample_typeBAL:treatmentHostZERO  
##                              -4.6623                              -24.8644  
##   sample_typeNasal:treatmentHostZERO   sample_typeSputum:treatmentHostZERO  
##                             -78.9810                              -50.6196  
##     sample_typeMock:treatmentMolYsis       sample_typeBAL:treatmentMolYsis  
##                              -3.0151                              -22.5705  
##    sample_typeNasal:treatmentMolYsis    sample_typeSputum:treatmentMolYsis  
##                             -53.8806                              -73.0365  
##      sample_typeMock:treatmentQIAamp        sample_typeBAL:treatmentQIAamp  
##                              -0.8505                               -8.9627  
##     sample_typeNasal:treatmentQIAamp     sample_typeSputum:treatmentQIAamp  
##                             -76.3805                              -19.9221

ANOVA result

lmer_host_ratio_all %>% anova() 

Table S1. Host ratio (stratified)

Table S1. Effect size, standard error (SE) and p-value at a statistical test on host DNA ratio using linear mixed effect model with treatment as fixed effect and subject as random effect using r package lmer::lmer( Host DNA % ~ treatment + (1|subject_id) ). Stratified analyses were conducted for each sample type as an interaction term of sample type and treatment was significant at an ANOVA test (p-value < 0.001) using a model, LMER (Host DNA % ~ sample type + treatment + sample type * treatment + (1|subject_id) ). The baseline of categorical variable is untreated, and statistical significances were noted with *: p-value < 0.05 and ***: p-value < 0.001.

HostZERO and Molysius was effect to to all QIAamp was effective for Nasal swab and Sputum

Stratified (BAL)

hr_lmer_bal <- lmer(sequencing_host_prop * 100 ~ treatment + (1|subject_id), 
                    data = sample_data %>%
                            data.frame %>%
                            mutate(total_read = phyloseq_unfiltered$phyloseq_count %>%
                            otu_table %>%
                            colSums()) %>%
                            subset(., .$sample_type %in% c("BAL")))

hr_lmer_bal %>%
        summary()
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: sequencing_host_prop * 100 ~ treatment + (1 | subject_id)
##    Data: 
## sample_data %>% data.frame %>% mutate(total_read = phyloseq_unfiltered$phyloseq_count %>%  
##     otu_table %>% colSums()) %>% subset(., .$sample_type %in%      c("BAL"))
## 
## REML criterion at convergence: 196.7
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -3.4797 -0.1994 -0.0472  0.5522  1.1072 
## 
## Random effects:
##  Groups     Name        Variance Std.Dev.
##  subject_id (Intercept)  35.98    5.998  
##  Residual               119.77   10.944  
## Number of obs: 30, groups:  subject_id, 5
## 
## Fixed effects:
##                    Estimate Std. Error      df t value Pr(>|t|)    
## (Intercept)          99.639      5.581  18.946  17.853 2.62e-13 ***
## treatmentlyPMA       -3.123      6.922  20.000  -0.451   0.6567    
## treatmentBenzonase   -1.145      6.922  20.000  -0.165   0.8703    
## treatmentHostZERO   -18.255      6.922  20.000  -2.637   0.0158 *  
## treatmentMolYsis    -17.697      6.922  20.000  -2.557   0.0188 *  
## treatmentQIAamp      -6.266      6.922  20.000  -0.905   0.3761    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr) trtPMA trtmnB tHZERO trtmMY
## trtmntlyPMA -0.620                            
## trtmntBnzns -0.620  0.500                     
## trtmntHZERO -0.620  0.500  0.500              
## trtmntMlYss -0.620  0.500  0.500  0.500       
## trtmntQIAmp -0.620  0.500  0.500  0.500  0.500

Stratified (nasal)

hr_lmer_ns <- lmer(sequencing_host_prop * 100 ~ treatment + (1|subject_id), data = sample_data %>% data.frame %>% subset(., .$sample_type %in% c("Nasal")))

hr_lmer_ns %>% summary
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: sequencing_host_prop * 100 ~ treatment + (1 | subject_id)
##    Data: 
## sample_data %>% data.frame %>% subset(., .$sample_type %in% c("Nasal"))
## 
## REML criterion at convergence: 275.3
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -1.6062 -0.5120 -0.1037  0.6480  1.5793 
## 
## Random effects:
##  Groups     Name        Variance Std.Dev.
##  subject_id (Intercept) 179.0    13.38   
##  Residual               417.5    20.43   
## Number of obs: 35, groups:  subject_id, 10
## 
## Fixed effects:
##                    Estimate Std. Error      df t value Pr(>|t|)    
## (Intercept)          95.226      7.723  24.557  12.330 5.16e-12 ***
## treatmentlyPMA      -27.681     11.631  23.874  -2.380 0.025655 *  
## treatmentBenzonase  -19.986     11.657  24.145  -1.715 0.099245 .  
## treatmentHostZERO   -73.580     11.657  24.145  -6.312 1.55e-06 ***
## treatmentMolYsis    -50.602     11.631  23.874  -4.351 0.000219 ***
## treatmentQIAamp     -75.415     11.657  24.145  -6.470 1.06e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr) trtPMA trtmnB tHZERO trtmMY
## trtmntlyPMA -0.465                            
## trtmntBnzns -0.464  0.255                     
## trtmntHZERO -0.464  0.255  0.385              
## trtmntMlYss -0.465  0.235  0.361  0.361       
## trtmntQIAmp -0.464  0.361  0.229  0.229  0.255

Stratified (sputum)

hr_lmer_spt <- lmer(sequencing_host_prop * 100 ~ treatment + (1|subject_id), data = sample_data %>% data.frame %>% subset(., .$sample_type %in% c("Sputum")))

hr_lmer_spt %>% summary
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: sequencing_host_prop * 100 ~ treatment + (1 | subject_id)
##    Data: 
## sample_data %>% data.frame %>% subset(., .$sample_type %in% c("Sputum"))
## 
## REML criterion at convergence: 192.4
## 
## Scaled residuals: 
##      Min       1Q   Median       3Q      Max 
## -2.25995 -0.43375  0.06531  0.47119  1.49643 
## 
## Random effects:
##  Groups     Name        Variance Std.Dev.
##  subject_id (Intercept)  25.39    5.039  
##  Residual               101.92   10.096  
## Number of obs: 30, groups:  subject_id, 5
## 
## Fixed effects:
##                    Estimate Std. Error      df t value Pr(>|t|)    
## (Intercept)          98.984      5.046  20.018  19.616 1.53e-14 ***
## treatmentlyPMA       -3.805      6.385  20.000  -0.596  0.55792    
## treatmentBenzonase   -6.254      6.385  20.000  -0.979  0.33904    
## treatmentHostZERO   -45.480      6.385  20.000  -7.123 6.68e-07 ***
## treatmentMolYsis    -69.633      6.385  20.000 -10.906 7.22e-10 ***
## treatmentQIAamp     -18.695      6.385  20.000  -2.928  0.00832 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr) trtPMA trtmnB tHZERO trtmMY
## trtmntlyPMA -0.633                            
## trtmntBnzns -0.633  0.500                     
## trtmntHZERO -0.633  0.500  0.500              
## trtmntMlYss -0.633  0.500  0.500  0.500       
## trtmntQIAmp -0.633  0.500  0.500  0.500  0.500

Merged table

hr_lmer_bal_kbl <- hr_lmer_bal %>%
        summary() %>%
        .$coefficients %>%
        data.frame(check.names = F) %>% 
        mutate(` ` = case_when(abs(`Pr(>|t|)`) < 0.001 ~ "***",
                               abs(`Pr(>|t|)`) < 0.01 ~ "**",
                               abs(`Pr(>|t|)`) < 0.05 ~ "*",
                               .default = " ")) %>% 
        rename("<i>p</i>-value" = "Pr(>|t|)",
               t_value = "t value") %>%
        merge(., 
              hr_lmer_bal %>% confint() %>%
                        round(digits = 1) %>% 
                        format(nsmall = 1) %>%
                        as.data.frame() %>%
                        mutate(conf = paste("(",
                                            gsub(" ","", .[,1]),
                                            ", ",
                                            gsub(" ","", .[,2]),
                                            ")",
                                            sep = "")),
              by = 0
              ) %>%
        column_to_rownames("Row.names") %>%
        rownames_to_column(var = "x") %>% mutate(x = gsub("treatment|sample_type", "", x)) %>% mutate(x = gsub(":", " * ", x)) %>%
        
        mutate(x = gsub("bal_log_centered_final_reads", "log<sub>10</sub>(Final reads)", x)) %>%
        column_to_rownames(var = "x") %>% 
        mutate("Effect size (95% CI)" = 
                       paste(round(Estimate, digits = 1) %>%
                                     format(nsmall = 1),
                             conf, 
                             sep = " "),
               "<i>p</i>-value" = round(`<i>p</i>-value`, 3)) %>%
        dplyr::select(c("Effect size (95% CI)", "<i>p</i>-value", " ")) %>%
        .[c("(Intercept)",
                  "lyPMA",
                  "Benzonase",
                  "HostZERO",
                  "MolYsis",
                  "QIAamp"),]

hr_lmer_ns_kbl <- hr_lmer_ns %>%
        summary() %>%
        .$coefficients %>%
        data.frame(check.names = F) %>% 
        mutate(` ` = case_when(abs(`Pr(>|t|)`) < 0.001 ~ "***",
                               abs(`Pr(>|t|)`) < 0.01 ~ "**",
                               abs(`Pr(>|t|)`) < 0.05 ~ "*",
                               .default = " ")) %>% 
        rename("<i>p</i>-value" = "Pr(>|t|)",
               t_value = "t value") %>%
        merge(., 
              hr_lmer_ns %>% confint() %>%
                        round(digits = 1) %>% 
                        format(nsmall = 1) %>%
                        as.data.frame() %>%
                        mutate(conf = paste("(",
                                            gsub(" ","", .[,1]),
                                            ", ",
                                            gsub(" ","", .[,2]),
                                            ")",
                                            sep = "")),
              by = 0
              ) %>%
        column_to_rownames("Row.names") %>%
        rownames_to_column(var = "x") %>% mutate(x = gsub("treatment|sample_type", "", x)) %>% mutate(x = gsub(":", " * ", x)) %>%
        mutate(x = gsub("bal_log_centered_final_reads", "log<sub>10</sub>(Final reads)", x)) %>%
        column_to_rownames(var = "x") %>% 
        mutate("Effect size (95% CI)" = 
                       paste(round(Estimate, digits = 1) %>%
                                     format(nsmall = 1),
                             conf, 
                             sep = " "),
               "<i>p</i>-value" = round(`<i>p</i>-value`, 3)) %>%
        dplyr::select(c("Effect size (95% CI)", "<i>p</i>-value", " ")) %>%
        .[c("(Intercept)",
                  "lyPMA",
                  "Benzonase",
                  "HostZERO",
                  "MolYsis",
                  "QIAamp"),]
        



hr_lmer_spt_kbl <- hr_lmer_spt %>%
        summary() %>%
        .$coefficients %>%
        data.frame(check.names = F) %>% 
        mutate(` ` = case_when(abs(`Pr(>|t|)`) < 0.001 ~ "***",
                               abs(`Pr(>|t|)`) < 0.01 ~ "**",
                               abs(`Pr(>|t|)`) < 0.05 ~ "*",
                               .default = " ")) %>% 
        rename("<i>p</i>-value" = "Pr(>|t|)",
               t_value = "t value") %>%
        merge(., 
              hr_lmer_spt %>% confint() %>%
                        round(digits = 1) %>% 
                        format(nsmall = 1) %>%
                        as.data.frame() %>%
                        mutate(conf = paste("(",
                                            gsub(" ","", .[,1]),
                                            ", ",
                                            gsub(" ","", .[,2]),
                                            ")",
                                            sep = "")),
              by = 0
              ) %>%
        column_to_rownames("Row.names") %>%
        rownames_to_column(var = "x") %>% mutate(x = gsub("treatment|sample_type", "", x)) %>% mutate(x = gsub(":", " * ", x)) %>%
        mutate(x = gsub("bal_log_centered_final_reads", "log<sub>10</sub>(Final reads)", x)) %>%
        column_to_rownames(var = "x") %>% 
        mutate("Effect size (95% CI)" = 
                       paste(round(Estimate, digits = 1) %>%
                                     format(nsmall = 1),
                             conf, 
                             sep = " "),
               "<i>p</i>-value" = round(`<i>p</i>-value`, 3)) %>%
        dplyr::select(c("Effect size (95% CI)", "<i>p</i>-value", " ")) %>%
        .[c("(Intercept)",
                  "lyPMA",
                  "Benzonase",
                  "HostZERO",
                  "MolYsis",
                  "QIAamp"),]
        


tables1 <- cbind(hr_lmer_bal_kbl, hr_lmer_ns_kbl, hr_lmer_spt_kbl) %>%
    kbl(format = "html", escape = 0) %>%
        add_header_above(c(" " = 1, "BAL" = 3, "Nasal swab" = 3,"Sputum"= 3)) %>% 
        kable_styling(full_width = 0, html_font = "sans")
tables1
BAL
Nasal swab
Sputum
Effect size (95% CI) p-value Effect size (95% CI) p-value Effect size (95% CI) p-value
(Intercept) 99.6 (89.4, 109.9) 0.000 *** 95.2 (80.9, 109.6) 0.000 *** 99.0 (89.8, 108.2) 0.000 ***
lyPMA -3.1 (-15.7, 9.5) 0.657 -27.7 (-49.0, -6.3) 0.026
-3.8 (-15.4, 7.8) 0.558
Benzonase -1.1 (-13.8, 11.5) 0.870 -20.0 (-41.4, 1.5) 0.099 -6.3 (-17.9, 5.4) 0.339
HostZERO -18.3 (-30.9, -5.6) 0.016
-73.6 (-94.9, -52.1) 0.000 *** -45.5 (-57.1, -33.8) 0.000 ***
MolYsis -17.7 (-30.3, -5.1) 0.019
-50.6 (-72.0, -29.3) 0.000 *** -69.6 (-81.3, -58.0) 0.000 ***
QIAamp -6.3 (-18.9, 6.3) 0.376 -75.4 (-96.9, -54.0) 0.000 *** -18.7 (-30.3, -7.1) 0.008 **
save_kable(tables1, file = "Project_SICAS2_microbiome/7_Manuscripts/2022_MGK_Host_Depletion/Figures/tableS1.html", self_contained = T)

Final reads

Final reads (all samples)

None-stratified. p-value of interaction term was 3.955e-09

Final reads output

lmer_final_reads <- lmer(Final_reads * 100 ~ sample_type * treatment + (1|subject_id), 
                    data = sample_data %>%
                            data.frame %>%
                            mutate(total_read = phyloseq_unfiltered$phyloseq_count %>%
                            otu_table %>%
                            colSums()) %>%
                            subset(., total_read != 0))

lmer_final_reads %>% summary
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: Final_reads * 100 ~ sample_type * treatment + (1 | subject_id)
##    Data: 
## sample_data %>% data.frame %>% mutate(total_read = phyloseq_unfiltered$phyloseq_count %>%  
##     otu_table %>% colSums()) %>% subset(., total_read != 0)
## 
## REML criterion at convergence: 5743.8
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -2.2476 -0.3384 -0.0107  0.1503  4.4188 
## 
## Random effects:
##  Groups     Name        Variance  Std.Dev. 
##  subject_id (Intercept) 5.197e+16 2.280e+08
##  Residual               3.548e+18 1.884e+09
## Number of obs: 155, groups:  subject_id, 22
## 
## Fixed effects:
##                                        Estimate Std. Error         df t value
## (Intercept)                           1.185e+08  8.021e+08  1.064e+01   0.148
## sample_typeMock                       9.198e+09  1.134e+09  1.064e+01   8.109
## sample_typeBAL                       -8.967e+07  1.242e+09  3.953e+01  -0.072
## sample_typeNasal                      4.114e+08  1.002e+09  2.139e+01   0.411
## sample_typeSputum                    -4.727e+07  1.168e+09  3.319e+01  -0.040
## treatmentlyPMA                        1.957e+09  1.141e+09  3.622e+19   1.716
## treatmentBenzonase                   -2.317e+07  1.141e+09  2.654e+18  -0.020
## treatmentHostZERO                     3.892e+07  1.141e+09  8.054e+18   0.034
## treatmentMolYsis                     -6.575e+06  1.141e+09  3.917e+18  -0.006
## treatmentQIAamp                      -2.342e+06  1.141e+09  1.867e+18  -0.002
## sample_typeMock:treatmentlyPMA       -6.842e+09  1.613e+09  2.346e+19  -4.241
## sample_typeBAL:treatmentlyPMA        -1.885e+09  1.754e+09  1.069e+20  -1.075
## sample_typeNasal:treatmentlyPMA      -2.353e+09  1.540e+09  1.617e+04  -1.528
## sample_typeSputum:treatmentlyPMA     -1.744e+09  1.649e+09  4.737e+19  -1.058
## sample_typeMock:treatmentBenzonase   -2.499e+08  1.613e+09  8.166e+18  -0.155
## sample_typeBAL:treatmentBenzonase     2.079e+08  1.703e+09  1.086e+05   0.122
## sample_typeNasal:treatmentBenzonase   4.019e+08  1.540e+09  1.588e+04   0.261
## sample_typeSputum:treatmentBenzonase  4.401e+08  1.649e+09  1.045e+19   0.267
## sample_typeMock:treatmentHostZERO     5.152e+08  1.613e+09  1.197e+19   0.319
## sample_typeBAL:treatmentHostZERO      5.142e+08  1.703e+09  1.086e+05   0.302
## sample_typeNasal:treatmentHostZERO    3.692e+09  1.540e+09  1.588e+04   2.398
## sample_typeSputum:treatmentHostZERO   3.483e+09  1.649e+09  1.874e+19   2.112
## sample_typeMock:treatmentMolYsis     -4.309e+08  1.613e+09  8.904e+18  -0.267
## sample_typeBAL:treatmentMolYsis       8.874e+08  1.703e+09  1.086e+05   0.521
## sample_typeNasal:treatmentMolYsis     8.013e+08  1.540e+09  1.617e+04   0.520
## sample_typeSputum:treatmentMolYsis    6.766e+09  1.649e+09  3.134e+19   4.103
## sample_typeMock:treatmentQIAamp       2.329e+09  1.613e+09  7.701e+18   1.444
## sample_typeBAL:treatmentQIAamp        8.553e+08  1.703e+09  1.086e+05   0.502
## sample_typeNasal:treatmentQIAamp      3.664e+09  1.540e+09  1.588e+04   2.380
## sample_typeSputum:treatmentQIAamp     2.157e+09  1.649e+09  4.723e+18   1.308
##                                      Pr(>|t|)    
## (Intercept)                            0.8853    
## sample_typeMock                      7.08e-06 ***
## sample_typeBAL                         0.9428    
## sample_typeNasal                       0.6853    
## sample_typeSputum                      0.9680    
## treatmentlyPMA                         0.0862 .  
## treatmentBenzonase                     0.9838    
## treatmentHostZERO                      0.9728    
## treatmentMolYsis                       0.9954    
## treatmentQIAamp                        0.9984    
## sample_typeMock:treatmentlyPMA       2.22e-05 ***
## sample_typeBAL:treatmentlyPMA          0.2825    
## sample_typeNasal:treatmentlyPMA        0.1264    
## sample_typeSputum:treatmentlyPMA       0.2902    
## sample_typeMock:treatmentBenzonase     0.8769    
## sample_typeBAL:treatmentBenzonase      0.9028    
## sample_typeNasal:treatmentBenzonase    0.7941    
## sample_typeSputum:treatmentBenzonase   0.7896    
## sample_typeMock:treatmentHostZERO      0.7494    
## sample_typeBAL:treatmentHostZERO       0.7627    
## sample_typeNasal:treatmentHostZERO     0.0165 *  
## sample_typeSputum:treatmentHostZERO    0.0347 *  
## sample_typeMock:treatmentMolYsis       0.7893    
## sample_typeBAL:treatmentMolYsis        0.6023    
## sample_typeNasal:treatmentMolYsis      0.6027    
## sample_typeSputum:treatmentMolYsis   4.08e-05 ***
## sample_typeMock:treatmentQIAamp        0.1488    
## sample_typeBAL:treatmentQIAamp         0.6155    
## sample_typeNasal:treatmentQIAamp       0.0173 *  
## sample_typeSputum:treatmentQIAamp      0.1908    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

ANOVA on lmer result

lmer_final_reads %>% anova() 

Table S2 Final reads (stratified)

lmer( Host DNA ratio ~ treatment + (1|subject_id) )

Table S2. Changes on final reads stratified by sample type tested with linear mixed effect models using r package lmer::lmer(log10(Final reads) ~ Treatment + (1|Subject id)). Stratified analyses were conducted for each sample type as an interaction term of sample type and treatment was significant at an ANOVA test (p-value < 0.001) using a model, LMER (log10(Final reads) ~ sample type + treatment + sample type * treatment + (1|subject_id) ). Effect size with adjusted 95% confidence intervals and p-value were listed. The unit of final read is reads x 106. Statistical significances were noted with *: p-value < 0.05, **: p-value < 0.01 and ***: p-value < 0.001.

Sputum’s final read increased after every treatment. Nasal swab showed improved reads with lyPMA, HostZERO and QIAamp. BAL also showed increased reads with most of treatment.

Raw results, BAL, nasal and Sputum, respectively

fr_lmer_bal <- lmer(log10(Final_reads/1000000) ~ treatment + (1|subject_id),
                    data = sample_data %>%
                                data.frame %>%
                                mutate(total_read = phyloseq_unfiltered$phyloseq_count %>%
                                otu_table %>%
                                colSums()) %>%
             subset(., .$sample_type %in% c("BAL")))


fr_lmer_bal %>% 
        summary()
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: log10(Final_reads/1e+06) ~ treatment + (1 | subject_id)
##    Data: 
## sample_data %>% data.frame %>% mutate(total_read = phyloseq_unfiltered$phyloseq_count %>%  
##     otu_table %>% colSums()) %>% subset(., .$sample_type %in%      c("BAL"))
## 
## REML criterion at convergence: 45.6
## 
## Scaled residuals: 
##      Min       1Q   Median       3Q      Max 
## -1.65203 -0.51284  0.03098  0.64282  1.40913 
## 
## Random effects:
##  Groups     Name        Variance Std.Dev.
##  subject_id (Intercept) 0.07389  0.2718  
##  Residual               0.21756  0.4664  
## Number of obs: 30, groups:  subject_id, 5
## 
## Fixed effects:
##                    Estimate Std. Error      df t value Pr(>|t|)   
## (Intercept)         -0.5067     0.2414 18.1625  -2.099  0.05009 . 
## treatmentlyPMA       0.3521     0.2950 20.0000   1.194  0.24656   
## treatmentBenzonase   0.8109     0.2950 20.0000   2.749  0.01238 * 
## treatmentHostZERO    0.9502     0.2950 20.0000   3.221  0.00429 **
## treatmentMolYsis     1.0358     0.2950 20.0000   3.511  0.00220 **
## treatmentQIAamp      1.0481     0.2950 20.0000   3.553  0.00199 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr) trtPMA trtmnB tHZERO trtmMY
## trtmntlyPMA -0.611                            
## trtmntBnzns -0.611  0.500                     
## trtmntHZERO -0.611  0.500  0.500              
## trtmntMlYss -0.611  0.500  0.500  0.500       
## trtmntQIAmp -0.611  0.500  0.500  0.500  0.500
fr_lmer_ns <- lmer(log10(Final_reads/1000000) ~ treatment + (1|subject_id),
                    data = sample_data %>%
                                data.frame %>%
                                mutate(total_read = phyloseq_unfiltered$phyloseq_count %>%
                                otu_table %>%
                                colSums()) %>%
             subset(., .$sample_type %in% c("Nasal")))


fr_lmer_ns %>% 
        summary()
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: log10(Final_reads/1e+06) ~ treatment + (1 | subject_id)
##    Data: 
## sample_data %>% data.frame %>% mutate(total_read = phyloseq_unfiltered$phyloseq_count %>%  
##     otu_table %>% colSums()) %>% subset(., .$sample_type %in%      c("Nasal"))
## 
## REML criterion at convergence: 52.3
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -1.0897 -0.7843 -0.2293  0.5901  1.6955 
## 
## Random effects:
##  Groups     Name        Variance Std.Dev.
##  subject_id (Intercept) 0.07592  0.2755  
##  Residual               0.19354  0.4399  
## Number of obs: 35, groups:  subject_id, 10
## 
## Fixed effects:
##                    Estimate Std. Error      df t value Pr(>|t|)    
## (Intercept)          0.5170     0.1642 25.0746   3.150 0.004193 ** 
## treatmentlyPMA      -0.5413     0.2499 24.0343  -2.166 0.040468 *  
## treatmentBenzonase   0.1437     0.2505 24.2989   0.574 0.571339    
## treatmentHostZERO    0.8560     0.2505 24.2989   3.418 0.002229 ** 
## treatmentMolYsis     0.2107     0.2499 24.0343   0.843 0.407498    
## treatmentQIAamp      1.0502     0.2505 24.2989   4.193 0.000316 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr) trtPMA trtmnB tHZERO trtmMY
## trtmntlyPMA -0.472                            
## trtmntBnzns -0.471  0.259                     
## trtmntHZERO -0.471  0.259  0.383              
## trtmntMlYss -0.472  0.239  0.359  0.359       
## trtmntQIAmp -0.471  0.359  0.234  0.234  0.259
fr_lmer_spt <- lmer(log10(Final_reads/1000000) ~ treatment + (1|subject_id),
                    data = sample_data %>%
                                data.frame %>%
                                mutate(total_read = phyloseq_unfiltered$phyloseq_count %>%
                                otu_table %>%
                                colSums()) %>%
             subset(., .$sample_type %in% c("Sputum"))) 


fr_lmer_spt %>% 
        summary()
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: log10(Final_reads/1e+06) ~ treatment + (1 | subject_id)
##    Data: 
## sample_data %>% data.frame %>% mutate(total_read = phyloseq_unfiltered$phyloseq_count %>%  
##     otu_table %>% colSums()) %>% subset(., .$sample_type %in%      c("Sputum"))
## 
## REML criterion at convergence: 8.4
## 
## Scaled residuals: 
##      Min       1Q   Median       3Q      Max 
## -1.89244 -0.59432 -0.06631  0.55937  1.70987 
## 
## Random effects:
##  Groups     Name        Variance Std.Dev.
##  subject_id (Intercept) 0.00774  0.08798 
##  Residual               0.04975  0.22305 
## Number of obs: 30, groups:  subject_id, 5
## 
## Fixed effects:
##                    Estimate Std. Error      df t value Pr(>|t|)    
## (Intercept)         -0.1734     0.1072 22.0056  -1.617  0.12010    
## treatmentlyPMA       0.5401     0.1411 20.0000   3.828  0.00105 ** 
## treatmentBenzonase   0.8459     0.1411 20.0000   5.996 7.30e-06 ***
## treatmentHostZERO    1.6733     0.1411 20.0000  11.862 1.67e-10 ***
## treatmentMolYsis     1.9901     0.1411 20.0000  14.107 7.42e-12 ***
## treatmentQIAamp      1.4158     0.1411 20.0000  10.036 2.98e-09 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr) trtPMA trtmnB tHZERO trtmMY
## trtmntlyPMA -0.658                            
## trtmntBnzns -0.658  0.500                     
## trtmntHZERO -0.658  0.500  0.500              
## trtmntMlYss -0.658  0.500  0.500  0.500       
## trtmntQIAmp -0.658  0.500  0.500  0.500  0.500

Tidified output

fr_lmer_bal_kbl <- fr_lmer_bal %>%
        summary() %>%
        .$coefficients %>%
        data.frame(check.names = F) %>% 
        mutate(` ` = case_when(abs(`Pr(>|t|)`) < 0.001 ~ "***",
                               abs(`Pr(>|t|)`) < 0.01 ~ "**",
                               abs(`Pr(>|t|)`) < 0.05 ~ "*",
                               .default = " ")) %>% 
        rename("<i>p</i>-value" = "Pr(>|t|)",
               t_value = "t value") %>%
        merge(., 
              fr_lmer_bal %>% confint() %>%
                        round(digits = 1) %>% 
                        format(nsmall = 1) %>%
                        as.data.frame() %>%
                        mutate(conf = paste("(",
                                            gsub(" ","", .[,1]),
                                            ", ",
                                            gsub(" ","", .[,2]),
                                            ")",
                                            sep = "")),
              by = 0
              ) %>%
        column_to_rownames("Row.names") %>%
        rownames_to_column(var = "x") %>% mutate(x = gsub("treatment|sample_type", "", x)) %>% mutate(x = gsub(":", " * ", x)) %>%
        mutate(x = gsub("bal_log_centered_final_reads", "log<sub>10</sub>(Final reads)", x)) %>%
        column_to_rownames(var = "x") %>% 
        mutate("Effect size (95% CI)" = 
                       paste(round(Estimate, digits = 1) %>%
                                     format(nsmall = 1),
                             conf, 
                             sep = " "),
               "<i>p</i>-value" = round(`<i>p</i>-value`, 3)) %>%
        dplyr::select(c("Effect size (95% CI)", "<i>p</i>-value", " ")) %>%
        .[c("(Intercept)",
                  "lyPMA",
                  "Benzonase",
                  "HostZERO",
                  "MolYsis",
                  "QIAamp"),]
        
fr_lmer_ns_kbl <- fr_lmer_ns %>%
        summary() %>%
        .$coefficients %>%
        data.frame(check.names = F) %>% 
        mutate(` ` = case_when(abs(`Pr(>|t|)`) < 0.001 ~ "***",
                               abs(`Pr(>|t|)`) < 0.01 ~ "**",
                               abs(`Pr(>|t|)`) < 0.05 ~ "*",
                               .default = " ")) %>% 
        rename("<i>p</i>-value" = "Pr(>|t|)",
               t_value = "t value") %>%
        merge(., 
              fr_lmer_ns %>% confint() %>%
                        round(digits = 1) %>% 
                        format(nsmall = 1) %>%
                        as.data.frame() %>%
                        mutate(conf = paste("(",
                                            gsub(" ","", .[,1]),
                                            ", ",
                                            gsub(" ","", .[,2]),
                                            ")",
                                            sep = "")),
              by = 0
              ) %>%
        column_to_rownames("Row.names") %>%
        rownames_to_column(var = "x") %>% mutate(x = gsub("treatment|sample_type", "", x)) %>% mutate(x = gsub(":", " * ", x)) %>%
        mutate(x = gsub("bal_log_centered_final_reads", "log<sub>10</sub>(Final reads)", x)) %>%
        column_to_rownames(var = "x") %>% 
        mutate("Effect size (95% CI)" = 
                       paste(round(Estimate, digits = 1) %>%
                                     format(nsmall = 1),
                             conf, 
                             sep = " "),
               "<i>p</i>-value" = round(`<i>p</i>-value`, 3)) %>%
        dplyr::select(c("Effect size (95% CI)", "<i>p</i>-value", " ")) %>%
        .[c("(Intercept)",
                  "lyPMA",
                  "Benzonase",
                  "HostZERO",
                  "MolYsis",
                  "QIAamp"),]


fr_lmer_spt_kbl <- fr_lmer_spt %>%
        summary() %>%
        .$coefficients %>%
        data.frame(check.names = F) %>% 
        mutate(` ` = case_when(abs(`Pr(>|t|)`) < 0.001 ~ "***",
                               abs(`Pr(>|t|)`) < 0.01 ~ "**",
                               abs(`Pr(>|t|)`) < 0.05 ~ "*",
                               .default = " ")) %>% 
        rename("<i>p</i>-value" = "Pr(>|t|)",
               t_value = "t value") %>%
        merge(., 
              fr_lmer_spt %>% confint() %>%
                        round(digits = 1) %>% 
                        format(nsmall = 1) %>%
                        as.data.frame() %>%
                        mutate(conf = paste("(",
                                            gsub(" ","", .[,1]),
                                            ", ",
                                            gsub(" ","", .[,2]),
                                            ")",
                                            sep = "")),
              by = 0
              ) %>%
        column_to_rownames("Row.names") %>%
        rownames_to_column(var = "x") %>% mutate(x = gsub("treatment|sample_type", "", x)) %>% mutate(x = gsub(":", " * ", x)) %>%
        mutate(x = gsub("bal_log_centered_final_reads", "log<sub>10</sub>(Final reads)", x)) %>%
        column_to_rownames(var = "x") %>% 
        mutate("Effect size (95% CI)" = 
                       paste(round(Estimate, digits = 1) %>%
                                     format(nsmall = 1),
                             conf, 
                             sep = " "),
               "<i>p</i>-value" = round(`<i>p</i>-value`, 3)) %>%
        dplyr::select(c("Effect size (95% CI)", "<i>p</i>-value", " ")) %>%
        .[c("(Intercept)",
                  "lyPMA",
                  "Benzonase",
                  "HostZERO",
                  "MolYsis",
                  "QIAamp"),]


tables2 <- cbind(fr_lmer_bal_kbl, fr_lmer_ns_kbl, fr_lmer_spt_kbl) %>%
    kbl(format = "html", escape = 0) %>%
        add_header_above(c(" " = 1, "BAL" = 3, "Nasal swab" = 3,"Sputum"= 3)) %>% 
        kable_styling(full_width = 0, html_font = "sans")
tables2
BAL
Nasal swab
Sputum
Effect size (95% CI) p-value Effect size (95% CI) p-value Effect size (95% CI) p-value
(Intercept) -0.5 (-1.0, -0.1) 0.050 0.5 (0.2, 0.8) 0.004 ** -0.2 (-0.4, 0.0) 0.120
lyPMA 0.4 (-0.2, 0.9) 0.247 -0.5 (-1.0, -0.1) 0.040
0.5 (0.3, 0.8) 0.001 **
Benzonase 0.8 (0.3, 1.3) 0.012
0.1 (-0.3, 0.6) 0.571 0.8 (0.6, 1.1) 0.000 ***
HostZERO 1.0 (0.4, 1.5) 0.004 ** 0.9 (0.4, 1.3) 0.002 ** 1.7 (1.4, 1.9) 0.000 ***
MolYsis 1.0 (0.5, 1.6) 0.002 ** 0.2 (-0.2, 0.7) 0.407 2.0 (1.7, 2.2) 0.000 ***
QIAamp 1.0 (0.5, 1.6) 0.002 ** 1.1 (0.6, 1.5) 0.000 *** 1.4 (1.2, 1.7) 0.000 ***
save_kable(tables2, file = "Project_SICAS2_microbiome/7_Manuscripts/2022_MGK_Host_Depletion/Figures/tableS2.html", self_contained = T)

Figure of sequencing result

Fig. S3. Figure of sequencing result

ii. How were changes in sequencing results?

Fig. S3. Host depletion effects measured by shotgun metagenomic sequencing. (A) Raw DNA reads, (B) host mapped reads by bowtie2, (C) final reads of microbes, and (D) proportion of host mapped among total mapped reads.

fs3a <- ggplot(sample_data %>% subset(., .$sample_type %in% c("BAL", "Nasal", "Sputum")), aes(x = sample_type, y = log10(Raw_reads))) +
        geom_jitter(aes(col = treatment, x = treatment),
                    lwd = 0.2, alpha = 0.3, stroke = 0)+
        stat_summary(aes(color = treatment, x = treatment),
                             fun.data="mean_sdl",  fun.args = list(mult=1), 
                             geom = "pointrange",  size = 0.4) +
        theme_classic (base_size = 12, base_family = "sans")+ 
        scale_color_manual(values = c("#e31a1c", "#fb9a99", "#33a02c", "#b2df8a", "#1f78b4", "#a6cee3"), name = "Treatment", labels = c("Control","lyPMA", "Benzonase", "HostZERO", "MolYsis", "QIAamp")) + #color using https://colorbrewer2.org/#type=qualitative&scheme=Set1&n=6
        scale_x_discrete(name ="Treatment")+
        theme(axis.title.y = element_markdown(),
              plot.tag = element_text(size = 15),
              axis.text.x = element_blank()) +
        ylab("log<sub>10</sub>(raw reads)") +
        labs(tag = "A") +
        facet_wrap(~sample_type, scale = "free_x") +
        guides(fill = guide_legend(nrow = 1))

#   - Host_mapped


fs3b <- ggplot(sample_data %>% subset(., .$sample_type %in% c("BAL", "Nasal", "Sputum")), aes(x = sample_type, y = log10(Host_mapped))) +
        geom_jitter(aes(col = treatment, x = treatment),
                    lwd = 0.2, alpha = 0.3, stroke = 0)+
        stat_summary(aes(color = treatment, x = treatment),
                             fun.data="mean_sdl",  fun.args = list(mult=1), 
                             geom = "pointrange",  size = 0.4) +
        theme_classic (base_size = 12, base_family = "sans")+ 
        scale_color_manual(values = c("#e31a1c", "#fb9a99", "#33a02c", "#b2df8a", "#1f78b4", "#a6cee3"), name = "Treatment", labels = c("Control","lyPMA", "Benzonase", "HostZERO", "MolYsis", "QIAamp")) + #color using https://colorbrewer2.org/#type=qualitative&scheme=Set1&n=6
        scale_x_discrete(name ="Treatment")+
        theme(axis.title.y = element_markdown(),
              plot.tag = element_text(size = 15),
              axis.text.x = element_blank()) +
        ylab("log<sub>10</sub>(host reads)") +
        labs(tag = "B") +
        facet_wrap(~sample_type, scale = "free_x") +
        guides(fill = guide_legend(nrow = 1))







#   - % Host (we have used Host_mapped/Raw_reads in prior papers)

#   - Final_reads

fs3c <- ggplot(sample_data %>% subset(., .$sample_type %in% c("BAL", "Nasal", "Sputum")), aes(x = sample_type, y = log10(Final_reads))) +
        geom_jitter(aes(col = treatment, x = treatment),
                    lwd = 0.2, alpha = 0.3, stroke = 0)+
        stat_summary(aes(color = treatment, x = treatment),
                             fun.data="mean_sdl",  fun.args = list(mult=1), 
                             geom = "pointrange",  size = 0.4) +
        theme_classic (base_size = 12, base_family = "sans")+ 
        scale_color_manual(values = c("#e31a1c", "#fb9a99", "#33a02c", "#b2df8a", "#1f78b4", "#a6cee3"), name = "Treatment", labels = c("Control","lyPMA", "Benzonase", "HostZERO", "MolYsis", "QIAamp")) + #color using https://colorbrewer2.org/#type=qualitative&scheme=Set1&n=6
        scale_x_discrete(name ="Treatment")+
        theme(axis.title.y = element_markdown(),
              plot.tag = element_text(size = 15),
              axis.text.x = element_blank()) +
        ylab("log<sub>10</sub>(final reads)") +
        labs(tag = "C") +
        facet_wrap(~sample_type, scale = "free_x") +
        guides(fill = guide_legend(nrow = 1))





#   - % Host (we have used Host_mapped/Raw_reads in prior papers)
fs3d <- ggplot(sample_data %>% subset(., .$sample_type %in% c("BAL", "Nasal", "Sputum")), aes(x = sample_type, y = sequencing_host_prop)) +
        geom_jitter(aes(col = treatment, x = treatment),
                    lwd = 0.2, alpha = 0.3, stroke = 0)+
        #geom_linerange(aes(xmin=lower.ci, xmax=upper.ci)) +
        stat_summary(aes(color = treatment, x = treatment),
                             fun.data="mean_sdl",  fun.args = list(mult=1), 
                             geom = "pointrange",  size = 0.4) +
        theme_classic (base_size = 12, base_family = "sans")+ 
        scale_color_manual(values = c("#e31a1c", "#fb9a99", "#33a02c", "#b2df8a", "#1f78b4", "#a6cee3"), name = "Treatment", labels = c("Control","lyPMA", "Benzonase", "HostZERO", "MolYsis", "QIAamp")) + #color using https://colorbrewer2.org/#type=qualitative&scheme=Set1&n=6
        scale_x_discrete(name ="Treatment")+
        theme(axis.title.y = element_markdown(),
              plot.tag = element_text(size = 15),
              axis.text.x = element_blank()) +
        
        ylab("Host ratio by sequencing") +
        labs(tag = "D") +
        facet_wrap(~sample_type, scale = "free_x") +
        guides(fill = guide_legend(nrow = 1))



figS3 <- ggarrange(fs3a, fs3b, fs3c, fs3d, common.legend = T, align = "hv")

figS3

png(file = "Project_SICAS2_microbiome/7_Manuscripts/2022_MGK_Host_Depletion/Figures/FigureS3.png",   # The directory you want to save the file in
    width = 180, # The width of the plot in inches
    height = 170, # The height of the plot in inches
    units = "mm",
    res = 600
) #fixing multiple page issue

figS3
# alpha diversity plots
#ggarrange(f4ad, ggarrange(f4e, f4f, ncol = 2),
#          ncol = 1) # alpha diversity plots

dev.off()
## quartz_off_screen 
##                 2

iii. Was host DNA proportion of sequencing similar to that of qPCR? (secondary analysis)

Fig. S4. Host ratio qPCR vs sequencing

Peggy’s comment: metagenomics is gold standard for % host, but most people don’t have the money to do deep sequencing. So secondary analysis is to calculate correlation between %host by qPCR vs %host by metagenomics (this can be a supplementary figure but would at least mention correlation in text)

Fig. S4. (A) Correlation plot and (B) Bland-Altman plot between host DNA proportion measured with qPCR and shotgun metagenomic sequencing.

figS4a <- ggplot(sample_data %>% subset(., .$sample_type %in% c("BAL", "Nasal", "Sputum")), aes(x = host_proportion, y = sequencing_host_prop, col = sample_type)) +
        geom_point() +
        theme_classic (base_size = 12, base_family = "sans")+ 
        scale_color_manual(values = c("#e31a1c", "#33a02c", "#1f78b4")) + #color using https://colorbrewer2.org/#type=qualitative&scheme=Set1&n=6
        #scale_x_discrete(name ="Sample type")+
        theme(axis.title.y = element_markdown(),
              plot.tag = element_text(size = 15),
              legend.position = "top") +
        ylab("% Host DNA (mNGS)") +
        xlab("% Host DNA (qPCR)") +
        labs(col = "Sample type") +
        annotate(family = "sans",
                 geom='richtext',
                 x=0.5, y=1,
                 label = paste("R<sup>2</sup> = ",
                               lm(sequencing_host_prop ~ host_proportion,
                                  data = sample_data %>%
                                          data.frame %>% 
                                          subset(., .$sample_type %in% c("BAL", "Nasal", "Sputum")))
                                  %>% summary %>% .$r.squared %>% round(., 2) %>% format(nsmall = 2), sep = "")) +
        geom_smooth(method=lm , color="red", se=T, level = 0.95) +
        guides(fill = guide_legend(nrow = 1)) +
        labs(tag = "A")


bland_altman_data <- sample_data %>% subset(., .$sample_type %in% c("BAL", "Nasal", "Sputum")) %>% data.frame %>%
        mutate(avg_two = (host_proportion + sequencing_host_prop)/2,
               diff_two = sequencing_host_prop - host_proportion)
        
figS4b <- ggplot(bland_altman_data, aes(x = avg_two, y = diff_two, col = sample_type)) +
        geom_point() +
        theme_classic (base_size = 12, base_family = "sans")+ 
        scale_color_manual(values = c("#e31a1c", "#33a02c", "#1f78b4")) + #color using https://colorbrewer2.org/#type=qualitative&scheme=Set1&n=6
        #scale_x_discrete(name ="Sample type")+
        theme(axis.title.y = element_markdown(),
              plot.tag = element_text(size = 15)) +
        ylab("Difference between qPCR and mNGS") +
        xlab("% Host DNA (mean)") +
        geom_hline(yintercept = mean(bland_altman_data$diff_two), colour = "black", size = 0.5) +
        geom_hline(yintercept = mean(bland_altman_data$diff_two) - (1.96 * sd(bland_altman_data$diff_two)), colour = "black", size = 0.5, linetype = "dashed") +
        geom_hline(yintercept = mean(bland_altman_data$diff_two) + (1.96 * sd(bland_altman_data$diff_two)), colour = "black", size = 0.5, linetype = "dashed") +
        labs(tag = "B")



figS4 <- ggarrange(figS4a, figS4b, common.legend = T, ncol = 1, align = "hv")

figS4

png(file = "Project_SICAS2_microbiome/7_Manuscripts/2022_MGK_Host_Depletion/Figures/FigureS4.png",   # The directory you want to save the file in
    width = 180, # The width of the plot in inches
    height =180, # The height of the plot in inches
    units = "mm",
    res = 600
) #fixing multiple page issue

figS4
# alpha diversity plots
#ggarrange(f4ad, ggarrange(f4e, f4f, ncol = 2),
#          ncol = 1) # alpha diversity plots

dev.off()
## quartz_off_screen 
##                 2

3.2. Quality control of sequencing data

iv. Were there any contaminants in the sequencing result? (Do these host depletion methods introduce contamination)

decontam

Table S10. decontam - stratified by sample_type

Table S10. Summary table of potential contaminants and their prevalence across all samples (N = 113) and within negative controls (total N = 31). The contaminants were identified using decontam37 combined method (Fisher’s exact test result of prevalence and frequency method results), using 16S qPCR bacterial DNA concentration as total bacterial load at prevalence threshold of 0.1. The analyses were stratified by sample types.

#Stratified by sample type
neg_number_decontam <- 
        phyloseq_unfiltered$phyloseq_count %>% 
        subset_samples(sample_type == "Neg.") %>%
        subset_samples(S.obs != 0) %>%
        sample_names() %>%
        length

sample_number_decontam <- 
        phyloseq_unfiltered$phyloseq_count %>% 
        subset_samples(sample_type != "Mock") %>%
        subset_samples(sample_type != "Neg.") %>%
        subset_samples(S.obs != 0) %>%
        sample_names() %>%
        length

prev_neg <- ((phyloseq_unfiltered$phyloseq_count %>% 
          subset_samples(sample_type == "Neg.") %>%
          otu_table %>% 
          data.frame(.)) != 0) %>%
        rowSums %>% data.frame %>% rename("Prevalence (negative controls)" = ".")

prev_all <- ((phyloseq_unfiltered$phyloseq_count %>%
                      subset_samples(sample_type != "Mock") %>%
                      subset_samples(sample_type != "Neg.") %>%
          otu_table %>% 
          data.frame(.)) != 0) %>%
        rowSums %>% data.frame %>% rename("Prevalence (all)" = ".")


sample_data(phyloseq_unfiltered$phyloseq_rel)$is.neg <- grepl("Neg", sample_data(phyloseq_unfiltered$phyloseq_rel)$sample_type)

phyloseq_decontam_bal <- phyloseq_unfiltered$phyloseq_rel %>% 
        subset_samples(S.obs != 0) %>%
        subset_samples(sample_type == "Neg." | sample_type == "BAL")
phyloseq_decontam_ns <- phyloseq_unfiltered$phyloseq_rel %>% 
        subset_samples(S.obs != 0) %>%
        subset_samples(sample_type == "Neg." | sample_type == "Nasal")
phyloseq_decontam_spt <- phyloseq_unfiltered$phyloseq_rel %>% 
        subset_samples(S.obs != 0) %>%
        subset_samples(sample_type == "Neg." | sample_type == "Sputum")


contaminant_combined_bal <- 
data.frame("BAL", fix.empty.names = F, 
        isContaminant(phyloseq_decontam_bal, method="combined", neg = "is.neg", threshold = 0.1, conc = "DNA_bac_ng_uL") %>% subset(.,.$contaminant) %>% row.names
)

contaminant_combined_ns <- 
data.frame("Nasal swab", fix.empty.names = F, 
        isContaminant(phyloseq_decontam_ns, method="combined", neg = "is.neg", threshold = 0.1, conc = "DNA_bac_ng_uL") %>% subset(.,.$contaminant) %>% row.names
)


contaminant_combined_spt <- 
data.frame("Sputum", fix.empty.names = F, 
        isContaminant(phyloseq_decontam_spt, method="combined", neg = "is.neg", threshold = 0.1, conc = "DNA_bac_ng_uL") %>% subset(.,.$contaminant) %>% row.names
)

contaminants <- rbind(contaminant_combined_bal, contaminant_combined_ns, contaminant_combined_spt)

names(contaminants) <- c("Sample type", "Taxa")


neg_number_decontam
## [1] 31
sample_number_decontam
## [1] 93
merged_contaminants <- merge(contaminants, prev_all %>% 
                                     rownames_to_column("Taxa"), by = "Taxa") %>%
        merge(., prev_neg %>%
                      rownames_to_column("Taxa"), by = "Taxa") %>%
       dplyr::select(c("Sample type",
                       "Taxa",
                       "Prevalence (all)",
                       "Prevalence (negative controls)")) %>%
        mutate(`Prevalence (all)` = paste0(`Prevalence (all)`,
                                           " (",
                                           round(`Prevalence (all)`/
                                                         sample_number_decontam * 100, 2),
                                           "%)"),
                `Prevalence (negative controls)` = paste0(`Prevalence (negative controls)`,
                                           " (",
                                           round(`Prevalence (negative controls)`/
                                                         neg_number_decontam * 100, 2),
                                           "%)")) %>%
        rename(`Prevalence (all) N = 93` = `Prevalence (all)`,
               `Prevalence (negative controls) N = 31` = `Prevalence (negative controls)`) %>%
        .[order(.$"Sample type", .$"Taxa"),] %>%
        remove_rownames()


species_italic2 <- function(data){
  data <- gsub("_", " ", data)
  data <- gsub("[]]|[[]", "", data)
  data <- gsub(" sp", " sp.", data)
  data <- gsub(" sp.", "</em> sp.", data)
  data <- gsub(" group", "", data)
  data <- ifelse(grepl("[*]", data), paste("<em>", data, sep = ""), paste("<em>", data, "</em>", sep = ""))
  data
}

Decontam result

Stratified-BAL

isContaminant(phyloseq_decontam_bal, method="combined", neg = "is.neg", threshold = 0.1, conc = "DNA_bac_ng_uL") %>% subset(.,.$contaminant) 

Stratified-Nasal

isContaminant(phyloseq_decontam_ns, method="combined", neg = "is.neg", threshold = 0.1, conc = "DNA_bac_ng_uL") %>% subset(.,.$contaminant) 

Stratified-Sputum

isContaminant(phyloseq_decontam_spt, method="combined", neg = "is.neg", threshold = 0.1, conc = "DNA_bac_ng_uL") %>% subset(.,.$contaminant) 

Decontam -stratified by treatment

#Stratified by sample type

prev_neg <- ((phyloseq_unfiltered$phyloseq_count %>% 
          subset_samples(sample_type == "Neg.") %>%
          otu_table %>% 
          data.frame(.)) != 0) %>%
        rowSums %>% data.frame %>% rename("Prevalence (negative controls)" = ".")

prev_all <- ((phyloseq_unfiltered$phyloseq_count %>%
                      subset_samples(sample_type != "Mock") %>%
                      subset_samples(sample_type != "Neg.") %>%
          otu_table %>% 
          data.frame(.)) != 0) %>%
        rowSums %>% data.frame %>% rename("Prevalence (all)" = ".")


sample_data(phyloseq_unfiltered$phyloseq_rel)$is.neg <- grepl("Neg", sample_data(phyloseq_unfiltered$phyloseq_rel)$sample_type)

phyloseq_decontam_lypma <- phyloseq_unfiltered$phyloseq_rel %>% 
        subset_samples(S.obs != 0) %>%
        subset_samples(treatment == "Untreated" | treatment == "lyPMA")

phyloseq_decontam_ben <- phyloseq_unfiltered$phyloseq_rel %>% 
        subset_samples(S.obs != 0) %>%
        subset_samples(treatment == "Untreated" | treatment == "Benzonase")

phyloseq_decontam_hostzero <- phyloseq_unfiltered$phyloseq_rel %>% 
        subset_samples(S.obs != 0) %>%
        subset_samples(treatment == "Untreated" | treatment == "HostZero")

phyloseq_decontam_molysis <- phyloseq_unfiltered$phyloseq_rel %>% 
        subset_samples(S.obs != 0) %>%
        subset_samples(treatment == "Untreated" | treatment == "MolYsis")

phyloseq_decontam_qiaamp <- phyloseq_unfiltered$phyloseq_rel %>% 
        subset_samples(S.obs != 0) %>%
        subset_samples(treatment == "Untreated" | treatment == "QIAamp")


contaminant_combined_lypma <- 
data.frame("lyPMA", fix.empty.names = F, 
        isContaminant(phyloseq_decontam_lypma, method="combined", neg = "is.neg", threshold = 0.1, conc = "DNA_bac_ng_uL") %>% subset(.,.$contaminant) %>% row.names
)

contaminant_combined_ben <- 
data.frame("Benzonase", fix.empty.names = F, 
        isContaminant(phyloseq_decontam_ben, method="combined", neg = "is.neg", threshold = 0.1, conc = "DNA_bac_ng_uL") %>% subset(.,.$contaminant) %>% row.names
)


contaminant_combined_hostzero <- 
data.frame("HostZero", fix.empty.names = F, 
        isContaminant(phyloseq_decontam_hostzero, method="combined", neg = "is.neg", threshold = 0.1, conc = "DNA_bac_ng_uL") %>% subset(.,.$contaminant) %>% row.names
)


contaminant_combined_molysis <- 
data.frame("MolYsis", fix.empty.names = F, 
        isContaminant(phyloseq_decontam_molysis, method="combined", neg = "is.neg", threshold = 0.1, conc = "DNA_bac_ng_uL") %>% subset(.,.$contaminant) %>% row.names
)


contaminant_combined_QIAamp <- 
data.frame("QIAamp", fix.empty.names = F, 
        isContaminant(phyloseq_decontam_qiaamp, method="combined", neg = "is.neg", threshold = 0.1, conc = "DNA_bac_ng_uL") %>% subset(.,.$contaminant) %>% row.names
)

contaminants_strat_treat <- rbind(contaminant_combined_lypma,
                      contaminant_combined_ben, 
                      contaminant_combined_hostzero,
                      contaminant_combined_molysis,
                      contaminant_combined_QIAamp
                      )

names(contaminants_strat_treat) <- c("Treatment", "Taxa")

12 taxa identified as contaminants. 7 Taxa identified as contaminants (sample type stratified).

Decontam summary

Summary table of potential contaminants with all sample types and stratified sample types in all methods (prevalence, frequence, and combined)

tableS10 <- merged_contaminants %>% 
        mutate(Taxa = species_italic2(Taxa)
               ) %>%
        kbl(format = "html", escape = F) %>%
        kable_styling(full_width = 0, html_font = "sans") 
tableS10
Sample type Taxa Prevalence (all) N = 93 Prevalence (negative controls) N = 31
BAL Cupriavidus sp. 82 (88.17%) 25 (80.65%)
BAL Cutibacterium acnes 53 (56.99%) 31 (100%)
BAL Sutterella parvirubra 48 (51.61%) 1 (3.23%)
Nasal swab Cupriavidus sp. 82 (88.17%) 25 (80.65%)
Nasal swab Sutterella parvirubra 48 (51.61%) 1 (3.23%)
Sputum Collinsella intestinalis 30 (32.26%) 0 (0%)
Sputum Cupriavidus sp. 82 (88.17%) 25 (80.65%)
Sputum Cutibacterium acnes 53 (56.99%) 31 (100%)
Sputum Leptotrichia sp. oral taxon 212 4 (4.3%) 0 (0%)
Sputum Rothia aeria 16 (17.2%) 0 (0%)
Sputum Streptococcus infantis 27 (29.03%) 0 (0%)
save_kable(tableS10, file = "Project_SICAS2_microbiome/7_Manuscripts/2022_MGK_Host_Depletion/Figures/tableS10.html", self_contained = T)

Filtering taxa

Prevalence filtering

Prevalence & abundance filtering was conducted

  1. M&M - Prevalence filtering

    1.  Prevalence filtration at 5%, except its abundance is over 0.75 quantile.
            a.  Information in the main text
phyloseq_unfiltered$phyloseq_rel <- transform_sample_counts(phyloseq_unfiltered$phyloseq_rel,
                                                            function(x){x/sum(x)})
taxa_qc <- data.frame("species" =
                              otu_table(
                                      subset_samples(
                                              phyloseq_unfiltered$phyloseq_rel,S.obs != 0 &
                                                      sample_type %in% c("Mock", "BAL", "Nasal", "Sputum"))) %>%
                              t() %>% colnames(),
                      "prevalence" =
                              ifelse(subset_samples(phyloseq_unfiltered$phyloseq_rel, S.obs != 0 & 
                                                            sample_type %in% c("Mock", "BAL", "Nasal", "Sputum")) %>%
                                             otu_table() > 0, 1, 0)%>% 
                              t() %>%
                              colSums(), #Prevalence of taxa
                      "mean_rel_abd" = 
                              subset_samples(phyloseq_unfiltered$phyloseq_rel,
                                             S.obs != 0 & sample_type %in% c("Mock", "BAL", "Nasal", "Sputum")) %>%
                              otu_table() %>%
                              t() %>%
                              colMeans(na.rm = T) #mean relativ abundacne 
)


function_qc <- data.frame("function" =
                                  otu_table(
                                          subset_samples(
                                                  phyloseq_unfiltered$phyloseq_path_rpk,
                                                           S.obs != 0 & 
                                                          sample_type %in% 
                                                          c("Mock", "BAL", "Nasal", "Sputum")
                                                  )
                                          ) %>% 
                                  t() %>% 
                                  colnames(),
                          "prevalence" = 
                                  ifelse(subset_samples(phyloseq_unfiltered$phyloseq_path_rpk,
                                                        S.obs != 0 & 
                                                                sample_type %in% 
                                                                c("Mock", "BAL", "Nasal", "Sputum")
                                                        ) %>% 
                                                 otu_table() > 0, 
                                         1, 
                                         0
                                         ) %>% 
                                  t() %>% 
                                  colSums(), #Prevalence of taxa
                          "mean_rpk" = 
                                  subset_samples(phyloseq_unfiltered$phyloseq_path_rpk, 
                                                 S.obs != 0 & 
                                                         sample_type %in% 
                                                         c("Mock", "BAL", "Nasal", "Sputum")
                                                 ) %>% 
                                  otu_table() %>% 
                                  t() %>% 
                                  colMeans(na.rm = T), #mean relativ abundacne 
                          unidentified = 
                                  ifelse((subset_samples(phyloseq_unfiltered$phyloseq_path_rpk,
                                              S.obs != 0 & sample_type %in% c("Mock", "BAL", "Nasal", "Sputum")) %>% 
                                       otu_table() > 0) %>%
                                      row.names() %in% c("UNMAPPED", "UNINTEGRATED")
                              , 1, 0)
)

red_flag_taxa <- data.frame(species = taxa_qc$species,
                            prevalence = taxa_qc$prevalence,
                            mean_rel_abd = taxa_qc$mean_rel_abd,
                            red_flag_prev_abd = 
                                    ifelse(taxa_qc$prevalence < 
                                                   otu_table(
                                                           subset_samples(
                                                                   phyloseq_unfiltered$phyloseq_rel,
                                                                   S.obs != 0 & 
                                                                           sample_type %in% 
                                                                           c("Mock", "BAL", "Nasal", "Sputum"))) %>%
                                                               t %>% rownames() %>%
                                                   length * 0.05 &
                                                   #Removing taxa with zero prevalence - taxa from nasal swabs
                                                   taxa_qc$mean_rel_abd <
                                                   taxa_qc %>%
                                                   subset(., .$prevalence != 0) %>%
                                                   .$mean_rel_abd %>%
                                                   quantile(., 0.75), 1,0),
                           red_flag_prev =
                                    ifelse(taxa_qc$prevalence < 
                                                   otu_table(
                                                           subset_samples(
                                                                   phyloseq_unfiltered$phyloseq_rel,
                                                                   S.obs != 0 & 
                                                                           sample_type %in% 
                                                                           c("Mock", "BAL", "Nasal", "Sputum"))) %>%
                                                               t %>% rownames() %>%
                                                   length * 0.05,
                                           1,
                                           0)) %>%
        mutate(red_flag_decontam = species %in% (contaminants$Taxa %>% unique()))

subset(red_flag_taxa, red_flag_taxa$red_flag_prev == 1 & red_flag_taxa$red_flag_prev_abd == 0)
#Unampped function were removed

red_flag_function <- 
        data.frame(function. = function_qc$function.,
                   prevalence = function_qc$prevalence,
                   mean_rel_abd = function_qc$mean_rpk,
                   red_flag_prev_abd = 
                           ifelse(function_qc$prevalence < 
                                          otu_table(
                                                  subset_samples(
                                                          phyloseq_unfiltered$phyloseq_path_rpk,
                                                          S.obs != 0 & 
                                                                  sample_type %in% 
                                                                  c("Mock", "BAL", "Nasal", "Sputum"))) %>%
                                          t %>% 
                                          rownames() %>%
                                          length * 0.05 &
                                          #Removing taxa with zero prevalence - taxa from nasal swabs
                                          function_qc$mean_rpk <
                                          function_qc %>%
                                          subset(., .$prevalence != 0) %>%
                                          .$mean_rpk %>%
                                          quantile(., 0.75), 1,0),
                   red_flag_prev =
                           ifelse(function_qc$prevalence <
                                          otu_table(
                                                  subset_samples(
                                                          phyloseq_unfiltered$phyloseq_path_rpk,
                                                          S.obs != 0 & 
                                                                  sample_type %in% 
                                                                  c("Mock", "BAL", "Nasal", "Sputum"))) %>%
                                          t %>% 
                                          rownames() %>%
                                          length * 0.05,
                                          1,
                                          0)) %>%
        mutate(red_flag_prev_abd = case_when(function. %in% c("UNMAPPED", "UNINTEGRATED") ~ 1,
                                     .default = red_flag_prev_abd))

subset(red_flag_function, red_flag_function$red_flag_prev == 1 & red_flag_function$red_flag_prev_abd == 0)
#decontaminated phyloseq 
phyloseq_decontam <- phyloseq
phyloseq_decontam$phyloseq_count <- prune_taxa(subset(red_flag_taxa,
                                           red_flag_taxa$red_flag_prev_abd != 1 &
                                                   !red_flag_taxa$red_flag_decontam)$species,
                                    phyloseq$phyloseq_count)

phyloseq_decontam$phyloseq_rel <- prune_taxa(subset(red_flag_taxa,
                                           red_flag_taxa$red_flag_prev_abd != 1 &
                                                   !red_flag_taxa$red_flag_decontam)$species,
                                    phyloseq$phyloseq_rel) %>%
        transform_sample_counts(., function(x){x/sum(x)})
        
#phyloseq for analysis
phyloseq$phyloseq_count <- prune_taxa(subset(red_flag_taxa,
                                           red_flag_taxa$red_flag_prev_abd != 1)$species,
                                    phyloseq$phyloseq_count)
phyloseq$phyloseq_rel <- prune_taxa(subset(red_flag_taxa,
                                           red_flag_taxa$red_flag_prev_abd != 1)$species,
                                    phyloseq$phyloseq_rel) %>%
        transform_sample_counts(., function(x){x/sum(x)})


phyloseq$phyloseq_path_rpk <- prune_taxa(subset(red_flag_function, red_flag_function$red_flag_prev_abd != 1)$function., phyloseq$phyloseq_path_rpk)

#phyloseq$tree_phyloseq_count <- prune_taxa(subset(red_flag_taxa,
                                           #red_flag_taxa$red_flag_prev_abd != 1 & !red_flag_taxa$red_flag_decontam_prev)$species,
                                    #phyloseq$tree_phyloseq_count)

#phyloseq$tree_phyloseq_rel <- prune_taxa(subset(red_flag_taxa,
                                           #red_flag_taxa$red_flag_prev_abd != 1 & !red_flag_taxa$red_flag_decontam_prev)$species,
                                    #phyloseq$tree_phyloseq_rel)

Filtration result

Taxa were filtered wehn red_flag_prev_abd == 1.

red_flag_taxa

Extras

Alpha diversity calculation

#Calculation of alpha diversity indices for filtered samples

alpha_diversity <- function(data) {
        otu_table <- otu_table(data) #%>% .[, colSums(.) !=0]
        S.obs <- rowSums(t(otu_table) != 0)
        sample_data <- sample_data(data)
        data_evenness <- vegan::diversity(t(otu_table)) / log(vegan::specnumber(t(otu_table))) # calculate evenness index using vegan package
        data_shannon <- vegan::diversity(t(otu_table), index = "shannon") # calculate Shannon index using vegan package
        data_hill <- exp(data_shannon)                           # calculate Hills index
        data_dominance <- microbiome::dominance(otu_table, index = "all", rank = 1, aggregate = TRUE) # dominance (Berger-Parker index), etc.
        data_invsimpson <- vegan::diversity(t(otu_table), index = "invsimpson")                          # calculate Shannon index using vegan package
        alpha_diversity <- cbind(S.obs, data_shannon, data_hill, data_invsimpson, data_evenness,data_dominance) # combine all indices in one data table
        sample_data <- merge(data.frame(sample_data), alpha_diversity, by = 0, all = T) %>% column_to_rownames(var = "Row.names")
}
#sample_data(phyloseq$phyloseq_count) <- sample_data(alpha_diversity(phyloseq$phyloseq_count))
sample_data(phyloseq$phyloseq_rel) <- sample_data(alpha_diversity(phyloseq$phyloseq_count))
sample_data(phyloseq$phyloseq_count) <-sample_data(alpha_diversity(phyloseq$phyloseq_count)) 
sample_data(phyloseq$phyloseq_path_rpk) <- sample_data(alpha_diversity(phyloseq$phyloseq_path_rpk))  

sample_data(phyloseq_tv) <- sample_data(alpha_diversity(phyloseq_tv))  

sample_data(phyloseq_decontam$phyloseq_rel) <- sample_data(alpha_diversity(phyloseq_decontam$phyloseq_count))  

sample_data(phyloseq_decontam$phyloseq_count) <- sample_data(alpha_diversity(phyloseq_decontam$phyloseq_count))

sample_data <- sample_data(phyloseq$phyloseq_count)

Alpha diversity calculation data

sample_data(phyloseq$phyloseq_rel) %>% 
        data.frame() %>%
        dplyr::select(c("S.obs", "data_shannon", "data_invsimpson")) 

Table 1 (revised)

Revised Table 1 was constructed after prevalence & abundacne filtering

Table 1. Sequencing results stratified by sample type and host depletion treatment. Number of samples in each experimental group, human and bacterial DNA measured by qPCR, number of samples did not pass the library QC or no microbial mapped reads, QC/d read, % host mapped reads, final reads, microbial species, function, and viral species richness. Microbial richness was calculated after employing prevalence and abundance filtration. Samples failed sequencing were removed from the calculation. Values depicted as N (%) or median (interquartile range).

sample_data(phyloseq$phyloseq_count)$sequencing_fail <- 
        ifelse(phyloseq$phyloseq_count %>% 
                       sample_data %>%
                       .$S.obs == 0,
               1,
               0)

sample_data(phyloseq$phyloseq_count) <- 
        
        merge(phyloseq$phyloseq_count %>% 
                      sample_data %>%
                      data.frame(check.names = F),
              phyloseq$phyloseq_path_rpk %>% #extracting functional richness as dataframe
                      sample_data %>%
                      data.frame(check.names = F) %>% 
                     dplyr::select(c("S.obs")) %>%
                      rename(F.obs = "S.obs"),
              by = 0
        ) %>%
                column_to_rownames("Row.names")

table1 <- sample_data(phyloseq$phyloseq_count) %>% data.frame() %>% 
        dplyr::filter(sample_type %in% c("Sputum", "Nasal", "BAL")) %>% 
        mutate(S.obs = case_when(S.obs==0 ~ NA,
                                 .default = S.obs),
               F.obs = case_when(F.obs==0 ~ NA,
                                 .default = F.obs),
               V.obs = case_when(V.obs==0 ~ NA,
                                 .default = V.obs)) %>% 
        group_by (sample_type, treatment) %>%
        summarise(`N` = n(),
            #      `Total DNA <br>ng/µL` = paste(format(round(median(picogreen_ng_ul),2), nsmall = 2, big.mark = ","), "<br>(", format(round(quantile(picogreen_ng_ul, 0.25),2), nsmall = 2, big.mark = ","), ", ", format(round(quantile(picogreen_ng_ul, 0.75),2), nsmall = 2, big.mark = ","), ")", sep = ""),
               `Human DNA <br>pg/µL` = paste(format(round(median(DNA_host_ng_uL*1000),1), nsmall = 1, big.mark = ","), " (", format(round(quantile(DNA_host_ng_uL*1000, 0.25),1), nsmall = 1, big.mark = ","), ", ", format(round(quantile(DNA_host_ng_uL*1000, 0.75),1), nsmall = 1, big.mark = ","), ")", sep = ""),
               `Bacterial DNA <br>pg/µL` = paste(format(round(median(DNA_bac_ng_uL*1000),1), nsmall = 1, big.mark = ","), " (", format(round(quantile(DNA_bac_ng_uL*1000, 0.25),1), nsmall = 1, big.mark = ","), ", ", format(round(quantile(DNA_bac_ng_uL*1000, 0.75),1), nsmall = 1, big.mark = ","), ")", sep = ""),
              `Sequencing fail<br>N (%)` = paste(sum(lib_failed + sequencing_fail), " (", sum(lib_failed +sequencing_fail) / n() * 100, " %)", sep = ""),
              `QC'd reads<br>reads x 10<sup>6</sup>` = paste(format(round(median(Reads_after_trim/1000000),1), nsmall = 1, big.mark = ","), " (", format(round(quantile(Reads_after_trim/1000000, 0.25),1), nsmall = 1, big.mark = ","), ", ", format(round(quantile(Reads_after_trim/1000000, 0.75),1), nsmall = 1, big.mark = ","), ")", sep = ""),
              `Host reads<br>%` = paste(format(round(median(sequencing_host_prop*100),1),
                                                nsmall = 1, big.mark = ","),
                                         " (",
                                         format(round(quantile(sequencing_host_prop * 100,
                                                               0.25),
                                                      1),
                                                nsmall = 1,
                                                big.mark = ","), 
                                         ", ", 
                                         format(round(quantile(sequencing_host_prop * 100, 0.75),1), 
                                                nsmall = 1, 
                                                big.mark = ","), 
                                         ")", 
                                         sep = ""),
             `Final reads<br>reads x 10<sup>6</sup>` = paste(format(round(median(Final_reads/1000000,
                                                                                 na.rm = T),
                                                                          1),
                                                                    nsmall = 1, big.mark = ","),
                                                             " (", 
                                                             format(round(quantile(Final_reads/1000000,
                                                                                            0.25,
                                                                                            na.rm = T),
                                                                                   1),
                                                                             nsmall = 1,
                                                                             big.mark = ","),
                                                             ", ",
                                                             format(round(quantile(Final_reads/1000000, 
                                                                                   0.75, 
                                                                                   na.rm = T),
                                                                          1), 
                                                                    nsmall = 1, 
                                                                    big.mark = ","), 
                                                             ")",
                                                             sep = ""),
             `Microbial<br>species<br>richness` = paste(median(S.obs, na.rm = T) %>% round,
                                                             " (", 
                                                             quantile(S.obs, 0.25, na.rm = T) %>% round,
                                                             ", ",
                                                             quantile(S.obs, 0.75, na.rm = T) %>% round,
                                                             ")",
                                                             sep = ""),
            `Microbial function<br>richness` = paste(median(F.obs, na.rm = T) %>% round,
                                                             " (", 
                                                             quantile(F.obs, 0.25, na.rm = T) %>% round,
                                                             ", ",
                                                             quantile(F.obs, 0.75, na.rm = T) %>% round,
                                                             ")",
                                                             sep = ""),
                `Viral<br>species<br>richness` = paste(median(V.obs, na.rm = T) %>% round,
                                                             " (", 
                                                             quantile(V.obs, 0.25, na.rm = T) %>% round,
                                                             ", ",
                                                             quantile(V.obs, 0.75, na.rm = T) %>% round,
                                                             ")",
                                                             sep = "")
              
            ) %>% 
        data.frame(check.names = F) %>% 
        arrange(sample_type, treatment) %>%
        rename(`Sample` = sample_type, Treatment = treatment) %>%
        mutate_all(linebreak) %>% kbl(format = "html", escape = F) %>% kable_styling(full_width = 0, html_font = "sans")

table1 
Sample Treatment N Human DNA
pg/µL
Bacterial DNA
pg/µL
Sequencing fail
N (%)
QC’d reads
reads x 106
Host reads
%
Final reads
reads x 106
Microbial
species
richness
Microbial function
richness
Viral
species
richness
BAL Untreated 5 1,512.8 (1,237.9, 9,836.4) 12.6 (10.5, 37.8) 1 (20 %) 129.5 (52.5, 129.9) 99.7 (99.6, 99.7) 0.3 (0.3, 0.4) 3 (2, 5) 8 (7, 56) 1 (1, 1)
BAL lyPMA 5 2,139.7 (60.4, 6,255.6) 8.4 (0.3, 17.4) 2 (40 %) 46.7 (28.6, 110.0) 99.1 (97.8, 99.5) 0.6 (0.4, 1.0) 7 (6, 8) 86 (54, 141) 2 (1, 2)
BAL Benzonase 5 59.6 (47.8, 70.1) 0.9 (0.7, 2.3) 0 (0 %) 149.3 (129.7, 183.7) 98.8 (98.7, 98.9) 1.7 (1.6, 2.2) 6 (5, 7) 162 (116, 179) 14 (9, 20)
BAL HostZERO 5 6.8 (2.3, 7.5) 0.4 (0.3, 1.1) 1 (20 %) 31.9 (18.4, 35.1) 83.7 (76.8, 87.2) 2.4 (1.3, 8.2) 8 (7, 11) 226 (132, 237) 17 (10, 24)
BAL MolYsis 5 7.6 (6.6, 25.2) 2.0 (0.3, 4.6) 1 (20 %) 39.0 (29.0, 39.3) 92.5 (92.5, 93.6) 2.9 (1.3, 15.6) 17 (7, 45) 232 (226, 264) 23 (12, 37)
BAL QIAamp 5 33.1 (32.0, 79.5) 0.5 (0.3, 1.9) 0 (0 %) 132.4 (119.6, 137.5) 98.3 (92.3, 98.6) 2.6 (1.0, 10.2) 8 (6, 14) 231 (32, 234) 27 (15, 39)
Nasal Untreated 10 340.2 (202.3, 685.8) 22.9 (16.9, 26.6) 0 (0 %) 106.2 (63.7, 138.7) 94.1 (92.8, 97.9) 4.8 (1.0, 8.7) 12 (9, 13) 152 (125, 167) 10 (5, 13)
Nasal lyPMA 5 2.6 (0.8, 9.2) 0.3 (0.3, 0.3) 4 (80 %) 7.9 (6.9, 9.7) 91.2 (35.6, 91.6) 0.7 (0.6, 0.8) 5 (5, 7) 146 (133, 181) 3 (2, 3)
Nasal Benzonase 5 12.8 (1.9, 78.8) 6.1 (5.4, 10.2) 0 (0 %) 47.1 (41.7, 53.2) 78.7 (77.8, 94.8) 2.8 (2.6, 10.4) 8 (7, 14) 200 (136, 203) 4 (3, 5)
Nasal HostZERO 5 0.5 (0.1, 0.7) 7.6 (3.3, 15.8) 2 (40 %) 24.5 (11.7, 55.2) 8.9 (2.7, 30.4) 24.3 (9.7, 50.3) 20 (19, 20) 224 (211, 226) 15 (11, 17)
Nasal MolYsis 5 0.4 (0.0, 0.8) 1.6 (1.1, 5.8) 4 (80 %) 8.1 (5.0, 34.9) 49.9 (5.0, 78.4) 3.2 (1.7, 25.3) 12 (11, 21) 210 (198, 216) 6 (4, 18)
Nasal QIAamp 5 2.1 (0.9, 7.1) 28.8 (24.7, 30.7) 0 (0 %) 56.2 (54.9, 58.5) 20.1 (15.7, 23.2) 46.3 (45.0, 46.7) 18 (17, 20) 223 (204, 227) 17 (8, 21)
Sputum Untreated 5 39,231.5 (19,448.0, 59,430.9) 245.3 (220.3, 311.0) 0 (0 %) 69.2 (68.0, 75.6) 99.2 (98.9, 99.2) 0.6 (0.6, 0.9) 13 (10, 15) 159 (157, 160) 2 (2, 2)
Sputum lyPMA 5 9,779.5 (994.2, 11,437.5) 97.8 (25.9, 100.9) 0 (0 %) 89.7 (42.0, 105.2) 96.4 (92.5, 98.3) 2.5 (1.5, 4.4) 50 (43, 65) 266 (259, 276) 11 (7, 16)
Sputum Benzonase 5 154.3 (141.3, 349.0) 33.2 (21.3, 53.0) 0 (0 %) 84.0 (82.0, 87.1) 94.2 (92.9, 94.5) 4.7 (4.5, 5.9) 94 (84, 95) 288 (282, 322) 33 (29, 36)
Sputum HostZERO 5 49.4 (11.7, 57.7) 38.9 (33.6, 39.0) 0 (0 %) 106.2 (61.6, 114.8) 61.7 (37.5, 68.0) 29.1 (23.6, 36.7) 131 (112, 133) 328 (313, 340) 89 (67, 162)
Sputum MolYsis 5 13.6 (8.0, 26.3) 28.4 (24.1, 30.4) 0 (0 %) 105.6 (90.8, 115.7) 32.8 (17.0, 33.8) 61.1 (55.6, 83.7) 126 (123, 135) 328 (326, 333) 140 (93, 153)
Sputum QIAamp 5 241.6 (196.3, 273.5) 64.3 (34.6, 71.0) 0 (0 %) 102.4 (100.9, 106.0) 88.2 (68.9, 88.6) 11.6 (11.3, 38.9) 103 (90, 114) 293 (291, 296) 45 (42, 47)
save_kable(table1, file = "Project_SICAS2_microbiome/7_Manuscripts/2022_MGK_Host_Depletion/Figures/table1.html", self_contained = T)
mock_otu <- subset_samples(phyloseq$phyloseq_count, sample_type =="Mock") %>% otu_table() %>% data.frame()

mock_otu_nonzero <- mock_otu[rowSums(mock_otu[])>0,]

+ α Reproducing Amy’s decontamination

This code takes a wile to run + 1800 lines of codes, therefore not included in this manuscript. (will be submitted separately).

Calculations used for main text

Figure 1 numbers

These tables were generated to note average med(IQR) of qPCR results

Figure 1 numbers (Host DNA) - treated vs untreated

This is calculations used for writing values of Figure 1 in the main text (as figure does not show exact value)

sample_data %>% 
        subset(., .$S.obs != 0) %>%
        group_by(sample_type, treated) %>%
        summarise(med = median(DNA_host_nondil + DNA_bac_nondil),
                  lo = quantile(DNA_host_ng_uL + DNA_host_ng_uL, 0.25),
                  high = quantile(DNA_host_nondil + DNA_bac_nondil, 0.75))
sample_data %>% subset(., .$S.obs != 0) %>%
        group_by(sample_type, treated) %>% summarise(med = median(DNA_host_ng_uL), lo = quantile(DNA_host_ng_uL, 0.25), high = quantile(DNA_host_ng_uL, 0.75))

Figure 1 numbers (Host prooportion) - treated vs untreated

sample_data %>% subset(., .$S.obs != 0) %>% group_by(sample_type, treated) %>% summarise(med = median(host_proportion), lo = quantile(host_proportion, 0.25), high = quantile(host_proportion, 0.75))

Figure 1 numbers (bacterial DNA) - treated vs untreated

sample_data %>% subset(., .$S.obs != 0) %>% group_by(sample_type, treated) %>% summarise(med = median(DNA_bac_ng_uL), lo = quantile(DNA_bac_ng_uL, 0.25), high = quantile(DNA_bac_ng_uL, 0.75))

3.3. Effects of treatments on taxonomic composition

*Did taxonomic composition change?

Fig. 2 Overview of sequencing results

  • PSL comments: I would move this to the supplement and replace with a figure that demonstrates the relative abundance of the 10 most abundant species (stratified by sample type) and 10 most abundant KEGG functions. The whole point is to highlight why you would want to do metagenomics rather than amplicon sequencing. Consider using the following qualitative color palette from colorbrewer:

Fig. 2. Relative abundances at species level by sample type after employing prevalence and abundance filtering and top 10 species in each sample type by mean abundance. (A) BAL, (B) nasal swabs, (C) sputum, and (D) mock community. Empty space indicates samples showed no microbial reads, i.e., sequencing failed samples.

my_plot_bar = function (physeq, x = "Sample", y = "Abundance", fill = NULL, title = NULL, 
                        facet_grid = NULL) {
    mdf = psmelt(physeq)
    p = ggplot(mdf, aes_string(x = x, y = y, fill = fill))
    p = p + geom_bar(stat = "identity")
    p = p + theme(axis.text.x = element_text(angle = -90, hjust = 0)) +
            scale_x_discrete(drop = F)
    if (!is.null(facet_grid)) {
        p <- p + facet_grid(facet_grid)
    }
    if (!is.null(title)) {
        p <- p + ggtitle(title)
    }
    return(p)
}

a <- tax_table(subset_samples(phyloseq$phyloseq_rel,
                              sample_type == "Mock")) %>%
        cbind(species20 = "[Other]") %>%
        {top20species <- head(taxa_sums(subset_samples(phyloseq$phyloseq_rel,
                              sample_type == "Mock") %>% 
           subset_samples(., S.obs != 0)) %>%
                                data.frame %>%
                                arrange(-.) %>%
                                row.names(), 10)
   .[top20species, "species20"] <- as.character(.[top20species, "Species"])
   .[, 9] <- gsub("s__", " ", .[, 9])
   .[, 9] <- gsub("_", " ", .[, 9])
   .[, 9] <- gsub("[]]|[[]", "",  .[, 9])
   .[, 9] <- gsub(" sp", " sp.",  .[, 9])
   .[, 9] <- gsub(" sp.", "</i> sp.",  .[, 9])
   .[, 9] <- gsub(" group", "</i> group.",  .[, 9])
   .[, 9] <- ifelse(grepl("Other",.[, 9]),
                    "Other",
                    ifelse(grepl("</i>",  .[, 9]),
                           paste("<i>",  .[, 9], sep = ""),
                           paste("<i>",  .[, 9], "</i>", sep = "")) %>%
                            gsub("s__", "", .) %>%
                            gsub("_", " ", .)
   )
   phyloseq_temp <- subset_samples(phyloseq$phyloseq_rel,
                              sample_type == "Mock") %>% 
           subset_samples(., S.obs != 0)
   tax_table(phyloseq_temp) <- tax_table(.) 
   phyloseq_temp
  } %>%
        my_plot_bar(., fill="species20") + 
        xlab("Sample") +
        ylab("") +
        theme_classic(base_size = 11, base_family = "sans") +
        theme(legend.text = element_markdown(size = 6),
              legend.key.size = unit(3, "mm"),
              legend.title = element_text(size = 6),
              axis.text.x = element_text(color = "white")) +
        guides(fill=guide_legend(title="Top 10 species")) +
        scale_fill_manual(values = c(RColorBrewer::brewer.pal(n = 11, name = "Paired"))) +
        facet_wrap (~ treatment, scales = "free_x", nrow = 1) +
        labs(tag = "A")
        #facet_wrap (~ factor(sample_type, levels = c("Mock", "BAL", "Nasal", "Sputum")),
        #            scales= "free_x", nrow=2) +
        
neg_bar_labels <- c(
                    `Untreated` = "Extraction",
                    `lyPMA` = "lyPMA",
                    `Benzonase` = "Benzonase",
                    `HostZERO` = "HostZERO",
                    `MolYsis` = "MolYsis",
                    `QIAamp` = "QIAamp"
                    )

plot_labeller <- function(variable,value){
  return(neg_bar_labels[value])
}

b <- tax_table(subset_samples(phyloseq$phyloseq_rel,
                              sample_type == "Neg.") %>%
        subset_samples(.,
                       baylor_other_id != "20220606_Neg") %>%
                transform_sample_counts(., function(x){ifelse(is.na(x), 0, x)})
        ) %>%
        cbind(species20 = "[Other]") %>%
        {top20species <- head(taxa_sums(subset_samples(phyloseq$phyloseq_rel,
                              sample_type == "Neg.") %>%
                                      subset_samples(.,
                                                     baylor_other_id != "20220606_Neg") %>%
                                      transform_sample_counts(.,
                                                              function(x){ifelse(is.na(x),
                                                                                 0, 
                                                                                 x)})
                              ) %>%
                                data.frame %>%
                                arrange(-.) %>%
                                row.names(), 10)
   .[top20species, "species20"] <- as.character(.[top20species, "Species"])
   .[, 9] <- gsub("s__", " ", .[, 9])
   .[, 9] <- gsub("_", " ", .[, 9])
   .[, 9] <- gsub("[]]|[[]", "",  .[, 9])
   .[, 9] <- gsub(" sp", " sp.",  .[, 9])
   .[, 9] <- gsub(" sp.", "</i> sp.",  .[, 9])
   .[, 9] <- gsub(" group", "</i> group.",  .[, 9])
   .[, 9] <- ifelse(grepl("Other",.[, 9]),
                    "Other",
                    ifelse(grepl("</i>",  .[, 9]),
                           paste("<i>",  .[, 9], sep = ""),
                           paste("<i>",  .[, 9], "</i>", sep = "")) %>%
                            gsub("s__", "", .) %>%
                            gsub("_", " ", .)
   )
   phyloseq_temp <- subset_samples(phyloseq$phyloseq_rel,
                              sample_type == "Neg.") %>%
        subset_samples(.,
                       baylor_other_id != "20220606_Neg") %>%
                transform_sample_counts(., function(x){ifelse(is.na(x), 0, x)})
   tax_table(phyloseq_temp) <- tax_table(.) 
   phyloseq_temp
  } %>% 
        my_plot_bar(., fill="species20") + 
        ylab("") +
        xlab("Sample") +
        theme_classic(base_size = 11, base_family = "sans") +
        theme(legend.text = element_markdown(size = 6),
              legend.key.size = unit(3, "mm"),
              legend.title = element_text(size = 6),
              axis.text.x = element_text(color = "white")) +
        guides(fill=guide_legend(title="Top 10 species")) +
        scale_fill_manual(values = c(RColorBrewer::brewer.pal(n = 11, name = "Paired"))) +
        #facet_wrap (~ treatment, scales = "free_x", nrow = 1) +
        facet_wrap ( ~ treatment,
                    scales= "free_x", nrow=1, labeller = plot_labeller
                    ) +
        ggtitle("Negative controls")

c <- tax_table(subset_samples(phyloseq$phyloseq_rel,
                              sample_type == "BAL") %>%
                transform_sample_counts(., function(x){ifelse(is.na(x), 0, x)})
        ) %>%
        cbind(species20 = "[Other]") %>%
        {top20species <- head(taxa_sums(subset_samples(phyloseq$phyloseq_rel,
                              sample_type == "BAL") %>%
                                      transform_sample_counts(.,
                                                              function(x){ifelse(is.na(x),
                                                                                 0, 
                                                                                 x)})
                              ) %>%
                                data.frame %>%
                                arrange(-.) %>%
                                row.names(), 10)
   .[top20species, "species20"] <- as.character(.[top20species, "Species"])
   .[, 9] <- gsub("s__", " ", .[, 9])
   .[, 9] <- gsub("_", " ", .[, 9])
   .[, 9] <- gsub("[]]|[[]", "",  .[, 9])
   .[, 9] <- gsub(" sp", " sp.",  .[, 9])
   .[, 9] <- gsub(" sp.", "</i> sp.",  .[, 9])
   .[, 9] <- gsub(" group", "</i> group.",  .[, 9])
   .[, 9] <- ifelse(grepl("Other",.[, 9]),
                    "Other",
                    ifelse(grepl("</i>",  .[, 9]),
                           paste("<i>",  .[, 9], sep = ""),
                           paste("<i>",  .[, 9], "</i>", sep = "")) %>%
                            gsub("s__", "", .) %>%
                            gsub("_", " ", .)
   )
   phyloseq_temp <- subset_samples(phyloseq$phyloseq_rel,
                              sample_type == "BAL") %>%
           transform_sample_counts(., function(x){ifelse(is.na(x), 0, x)})
   tax_table(phyloseq_temp) <- tax_table(.) 
   sample_data(phyloseq_temp)$subject_id <- sample_data(phyloseq_temp)$subject_id %>% 
           factor(labels = c("A", "B", "C", "D", "E"))
   phyloseq_temp
  } %>% 
        my_plot_bar(., x = "subject_id", fill="species20") + 
        ylab("") +
        theme_classic(base_size = 11, base_family = "sans") +
        theme(legend.text = element_markdown(size = 6),
              legend.key.size = unit(3, "mm"),
              legend.title = element_text(size = 6),
              axis.title.x = element_blank()) +
        guides(fill=guide_legend(title="Top 10 species")) +
        scale_fill_manual(values = c(RColorBrewer::brewer.pal(n = 11, name = "Paired"))) +
        #facet_wrap (~ treatment, scales = "free_x", nrow = 1) +
        facet_wrap ( ~ treatment,
                    scales= "free_x", nrow=1) +
        ggtitle("A    BAL")


d <- tax_table(subset_samples(phyloseq$phyloseq_rel,
                              sample_type == "Nasal") %>% 
           subset_samples(., S.obs != 0)) %>%
        cbind(species20 = "[Other]") %>%
        {top20species <- head(taxa_sums(subset_samples(phyloseq$phyloseq_rel,
                              sample_type == "Nasal") %>% 
           subset_samples(., S.obs != 0)) %>%
                                data.frame %>%
                                arrange(-.) %>%
                                row.names(), 10)
   .[top20species, "species20"] <- as.character(.[top20species, "Species"])
   .[, 9] <- gsub("s__", " ", .[, 9])
   .[, 9] <- gsub("_", " ", .[, 9])
   .[, 9] <- gsub("[]]|[[]", "",  .[, 9])
   .[, 9] <- gsub(" sp", " sp.",  .[, 9])
   .[, 9] <- gsub(" sp.", "</i> sp.",  .[, 9])
   .[, 9] <- gsub(" group", "</i> group.",  .[, 9])
   .[, 9] <- ifelse(grepl("Other",.[, 9]),
                    "Other",
                    ifelse(grepl("</i>",  .[, 9]),
                           paste("<i>",  .[, 9], sep = ""),
                           paste("<i>",  .[, 9], "</i>", sep = "")) %>%
                            gsub("s__", "", .) %>%
                            gsub("_", " ", .)
   )
   phyloseq_temp <- subset_samples(phyloseq$phyloseq_rel,
                              sample_type == "Nasal") %>% 
           subset_samples(., S.obs != 0)
   tax_table(phyloseq_temp) <- tax_table(.) 
   sample_data(phyloseq_temp)$subject_id <- sample_data(phyloseq_temp)$subject_id %>% 
           factor(labels = c("A", "B", "C", "D", "E", "F", "G", "H", "I", "J")) %>%
           as.character()
   phyloseq_temp
  } %>% 
        my_plot_bar(., x = "subject_id", fill="species20") + 
        ylab("") +
        theme_classic(base_size = 11, base_family = "sans") +
        theme(legend.text = element_markdown(size = 6),
              legend.key.size = unit(3, "mm"),
              legend.title = element_text(size = 6),
              axis.title.x = element_blank()) +
        guides(fill=guide_legend(title="Top 10 species")) +
        scale_fill_manual(values = c(RColorBrewer::brewer.pal(n = 11, name = "Paired"))) +
#        scale_x_discrete(drop=) +
        facet_wrap(~ treatment,  nrow = 1, drop = T, scales = "free_x") +
        ggtitle("B    Nasal swabs")

e <- tax_table(subset_samples(phyloseq$phyloseq_rel,
                              sample_type == "Sputum") %>% 
           subset_samples(., S.obs != 0)) %>%
        cbind(species20 = "[Other]") %>%
        {top20species <- head(taxa_sums(subset_samples(phyloseq$phyloseq_rel,
                              sample_type == "Sputum") %>% 
           subset_samples(., S.obs != 0)) %>%
                                data.frame %>%
                                arrange(-.) %>%
                                row.names(), 10)
   .[top20species, "species20"] <- as.character(.[top20species, "Species"])
   .[, 9] <- gsub("s__", " ", .[, 9])
   .[, 9] <- gsub("_", " ", .[, 9])
   .[, 9] <- gsub("[]]|[[]", "",  .[, 9])
   .[, 9] <- gsub(" sp", " sp.",  .[, 9])
   .[, 9] <- gsub(" sp.", "</i> sp.",  .[, 9])
   .[, 9] <- gsub(" group", "</i> group.",  .[, 9])
   .[, 9] <- ifelse(grepl("Other",.[, 9]),
                    "Other",
                    ifelse(grepl("</i>",  .[, 9]),
                           paste("<i>",  .[, 9], sep = ""),
                           paste("<i>",  .[, 9], "</i>", sep = "")) %>%
                            gsub("s__", "", .) %>%
                            gsub("_", " ", .)
   )
   phyloseq_temp <- subset_samples(phyloseq$phyloseq_rel,
                              sample_type == "Sputum") %>% 
           subset_samples(., S.obs != 0)
   tax_table(phyloseq_temp) <- tax_table(.) 
   sample_data(phyloseq_temp)$subject_id <- sample_data(phyloseq_temp)$subject_id %>% 
           factor(labels = c("A", "B", "C", "D", "E"))
   phyloseq_temp
  } %>%
        my_plot_bar(., x = "subject_id", fill="species20")+
        ylab("") +
        theme_classic(base_size = 11, base_family = "sans") +
        theme(legend.text = element_markdown(size = 6),
              legend.key.size = unit(3, "mm"),
              legend.title = element_text(size = 6),
              axis.title.x = element_blank()) +
        guides(fill=guide_legend(title="Top 10 species")) +
        scale_fill_manual(values = c(RColorBrewer::brewer.pal(n = 11, name = "Paired"))) +
        facet_wrap(~ treatment, scales= "free_x", nrow = 1) +
        ggtitle("C    Sputum")


fig2 <- ggarrange(c, c %>% lemon::g_legend() %>% as_ggplot,
                  d, d %>% lemon::g_legend() %>% as_ggplot,
                  e, e %>% lemon::g_legend() %>% as_ggplot,
                  #a, a %>% lemon::g_legend() %>% as_ggplot,
                  #b, b %>% lemon::g_legend() %>% as_ggplot,
                  ncol=2,nrow=3, widths = c(3, 1),
                  legend = "none",
                  align = "hv")

annotate_figure(fig2,
                left = text_grob("Relative abundance",
                                 rot = 90,
                                 family = "sans", 
                                 size = 11),
                bottom = text_grob("Subject",
                                 rot = 0,
                                 family = "sans", 
                                 size = 11)
)

png(file = "Project_SICAS2_microbiome/7_Manuscripts/2022_MGK_Host_Depletion/Figures/Figure2.png",   # The directory you want to save the file in
    width = 180, # The width of the plot in inches
    height = 160, # The height of the plot in inches
    units = "mm",
    res = 600
) #fixing multiple page issue

annotate_figure(fig2,
                left = text_grob("Relative abundance",
                                 rot = 90,
                                 family = "sans", 
                                 size = 11),
                bottom = text_grob("Subject",
                                 rot = 0,
                                 family = "sans", 
                                 size = 11)
)

# alpha diversity plots
#ggarrange(f4ad, ggarrange(f4e, f4f, ncol = 2),
#          ncol = 1) # alpha diversity plots

dev.off()
## quartz_off_screen 
##                 2

** 3 Candidat and 2 Staphylococcus –> can be used for a reason why we should be doing shotgun sequencing (species level)

Dolosgranulum - repored present in nose Malassezia - fungi [others] –> [Other]

Fig. S5. Changes of Mock microbial community

This figure will be updated, after calculating species richenss of all samples

barplot_mock_microbe <- a


barplot_mock_microbe

Why Cryptococcus is missing? Double-check with a plot of top 20 taxa

tax_table(subset_samples(phyloseq$phyloseq_rel,
                              sample_type == "Mock")) %>%
        cbind(species20 = "[Other]") %>%
        {top20species <- head(taxa_sums(subset_samples(phyloseq$phyloseq_rel,
                              sample_type == "Mock") %>% 
           subset_samples(., S.obs != 0)) %>%
                                data.frame %>%
                                arrange(-.) %>%
                                row.names(), 20)
   .[top20species, "species20"] <- as.character(.[top20species, "Species"])
   .[, 9] <- gsub("s__", " ", .[, 9])
   .[, 9] <- gsub("_", " ", .[, 9])
   .[, 9] <- gsub("[]]|[[]", "",  .[, 9])
   .[, 9] <- gsub(" sp", " sp.",  .[, 9])
   .[, 9] <- gsub(" sp.", "</i> sp.",  .[, 9])
   .[, 9] <- gsub(" group", "</i> group.",  .[, 9])
   .[, 9] <- ifelse(grepl("Other",.[, 9]),
                    "Other",
                    ifelse(grepl("</i>",  .[, 9]),
                           paste("<i>",  .[, 9], sep = ""),
                           paste("<i>",  .[, 9], "</i>", sep = "")) %>%
                            gsub("s__", "", .) %>%
                            gsub("_", " ", .)
   )
   phyloseq_temp <- subset_samples(phyloseq$phyloseq_rel,
                              sample_type == "Mock") %>% 
           subset_samples(., S.obs != 0)
   tax_table(phyloseq_temp) <- tax_table(.) 
   phyloseq_temp
  } %>%
        my_plot_bar(., fill="species20") + 
        xlab("Subject") +
        ylab("") +
        theme_classic(base_size = 11, base_family = "sans") +
        theme(legend.text = element_markdown(size = 6),
              legend.key.size = unit(3, "mm"),
              legend.title = element_text(size = 6),
              axis.text.x = element_text(color = "white")) +
        guides(fill=guide_legend(title="Top 20 species")) +
        scale_fill_manual(values = c(RColorBrewer::brewer.pal(n = 9, name = "Set3"),
                                     RColorBrewer::brewer.pal(n = 12, name = "Paired")
                                     )) +
        facet_wrap (~ treatment, scales = "free_x", nrow = 1)

        #ggtitle("D    Mock")
        #facet_wrap (~ factor(sample_type, levels = c("Mock", "BAL", "Nasal", "Sputum")),
        #            scales= "free_x", nrow=2) +

There were Cryptococcus neoformans, however, it seems like they were assigned as Cryptococcus gattii.

Negative control plot for R2R

Peggy’s on 20240912: create a stacked barplot where you facet by type of negative control

barplot_neg_microbe <- b        


annotate_figure(barplot_neg_microbe,
                left = text_grob("Relative abundance",
                                 rot = 90,
                                 family = "sans", 
                                 size = 11))

png(file = "Project_SICAS2_microbiome/7_Manuscripts/2022_MGK_Host_Depletion/Figures/FigureC12.png",   # The directory you want to save the file in
    width = 180, # The width of the plot in inches
    height = 80, # The height of the plot in inches
    units = "mm",
    res = 600
) #fixing multiple page issue


annotate_figure(barplot_neg_microbe,
                left = text_grob("Relative abundance",
                                 rot = 90,
                                 family = "sans", 
                                 size = 11))


# alpha diversity plots
#ggarrange(f4ad, ggarrange(f4e, f4f, ncol = 2),
#          ncol = 1) # alpha diversity plots

dev.off()
## quartz_off_screen 
##                 2

Predicted function barplot

This figure is not included in the main text as it is having too taxonomic groups

phyloseq$phyloseq_path_cpm <- transform_sample_counts(phyloseq$phyloseq_path_rpk, function(x){x/sum(x)*1000000})

a <- tax_table(subset_samples(phyloseq$phyloseq_path_cpm,
                              sample_type == "Mock") %>%
        subset_samples(.,
                       baylor_other_id != "20220606_Neg") %>%
                transform_sample_counts(., function(x){ifelse(is.na(x), 0, x)})
        ) %>%
        cbind(species20 = "[Other]") %>%
        {top20species <- head(taxa_sums(subset_samples(phyloseq$phyloseq_path_cpm,
                              sample_type == "Mock") %>%
        subset_samples(.,
                       baylor_other_id != "20220606_Neg") %>%
                transform_sample_counts(., function(x){ifelse(is.na(x), 0, x)})) %>%
                                data.frame %>%
                                arrange(-.) %>%
                                row.names(), 10)
   .[top20species, "species20"] <- as.character(.[top20species, "pathway"])
   .[, 3] <- gsub("s__", " ", .[, 3])
   .[, 3] <- gsub("_", " ", .[, 3])
   .[, 3] <- gsub("[]]|[[]", "",  .[, 3])
   .[, 3] <- gsub(" sp", " sp.",  .[, 3])
   .[, 3] <- gsub(" sp.", "</i> sp.",  .[, 3])
   .[, 3] <- gsub(" group", "</i> group.",  .[, 3])
   .[, 3] <- ifelse(grepl("Other",.[, 3]),
                    "Other",
                    ifelse(grepl("</i>",  .[, 3]),
                           paste("<i>",  .[, 3], sep = ""),
                           paste("<i>",  .[, 3], "</i>", sep = "")) %>%
                            gsub("s__", "", .) %>%
                            gsub("_", " ", .)
   )
   phyloseq_temp <- subset_samples(phyloseq$phyloseq_path_cpm,
                              sample_type == "Mock") %>%
        subset_samples(.,
                       baylor_other_id != "20220606_Neg") %>%
                transform_sample_counts(., function(x){ifelse(is.na(x), 0, x)})
   tax_table(phyloseq_temp) <- tax_table(.) 
   phyloseq_temp
  } %>%
        my_plot_bar(., fill="species20") + 
        ylab("Relative abundancne") +
        theme_classic(base_size = 11, base_family = "sans") +
        theme(legend.text = element_markdown(), 
              axis.text.x = element_text(size = 0)) +
        guides(fill=guide_legend(title="Top 10 species")) +
        scale_fill_manual(values = c(RColorBrewer::brewer.pal(n = 11, name = "Paired"))) +
        facet_wrap (~ treatment, scales= "free_x", nrow = 1) +
        #facet_wrap (~ factor(sample_type, levels = c("Mock", "BAL", "Nasal", "Sputum")),
        #            scales= "free_x", nrow=2) +
        labs(tag="D") + 
        ggtitle("Mock")



b <- tax_table(subset_samples(phyloseq$phyloseq_path_cpm,
                              sample_type == "Neg.") %>%
        subset_samples(.,
                       baylor_other_id != "20220606_Neg") %>%
                transform_sample_counts(., function(x){ifelse(is.na(x), 0, x)})
        ) %>%
        cbind(species20 = "[Other]") %>%
        {top20species <- head(taxa_sums(subset_samples(phyloseq$phyloseq_path_cpm,
                              sample_type == "Neg.") %>%
        subset_samples(.,
                       baylor_other_id != "20220606_Neg") %>%
                transform_sample_counts(., function(x){ifelse(is.na(x), 0, x)})) %>%
                                data.frame %>%
                                arrange(-.) %>%
                                row.names(), 10)
   .[top20species, "species20"] <- as.character(.[top20species, "pathway"])
   .[, 3] <- gsub("s__", " ", .[, 3])
   .[, 3] <- gsub("_", " ", .[, 3])
   .[, 3] <- gsub("[]]|[[]", "",  .[, 3])
   .[, 3] <- gsub(" sp", " sp.",  .[, 3])
   .[, 3] <- gsub(" sp.", "</i> sp.",  .[, 3])
   .[, 3] <- gsub(" group", "</i> group.",  .[, 3])
   .[, 3] <- ifelse(grepl("Other",.[, 3]),
                    "Other",
                    ifelse(grepl("</i>",  .[, 3]),
                           paste("<i>",  .[, 3], sep = ""),
                           paste("<i>",  .[, 3], "</i>", sep = "")) %>%
                            gsub("s__", "", .) %>%
                            gsub("_", " ", .)
   )
   phyloseq_temp <- subset_samples(phyloseq$phyloseq_path_cpm,
                              sample_type == "Neg.") %>%
        subset_samples(.,
                       baylor_other_id != "20220606_Neg") %>%
                transform_sample_counts(., function(x){ifelse(is.na(x), 0, x)})
   tax_table(phyloseq_temp) <- tax_table(.) 
   phyloseq_temp
  } %>%
        my_plot_bar(., fill="species20") + 
        ylab("Relative abundancne") +
        theme_classic(base_size = 11, base_family = "sans") +
        theme(legend.text = element_markdown(), 
              axis.text.x = element_text(size = 0)) +
        guides(fill=guide_legend(title="Top 10 species")) +
        scale_fill_manual(values = c(RColorBrewer::brewer.pal(n = 11, name = "Paired"))) +
        facet_wrap (~ treatment, scales= "free_x", nrow = 1) +
        #facet_wrap (~ factor(sample_type, levels = c("Mock", "BAL", "Nasal", "Sputum")),
        #            scales= "free_x", nrow=2) +
        labs(tag="E") + 
        ggtitle("Negative controls")



c <- tax_table(subset_samples(phyloseq$phyloseq_path_cpm,
                              sample_type == "BAL") %>%
        subset_samples(.,
                       baylor_other_id != "20220606_Neg") %>%
                transform_sample_counts(., function(x){ifelse(is.na(x), 0, x)})) %>%
        cbind(species20 = "[Other]") %>%
        {top20species <- head(taxa_sums(subset_samples(phyloseq$phyloseq_path_cpm,
                              sample_type == "BAL") %>%
        subset_samples(.,
                       baylor_other_id != "20220606_Neg") %>%
                transform_sample_counts(., function(x){ifelse(is.na(x), 0, x)})) %>%
                                data.frame %>%
                                arrange(-.) %>%
                                row.names(), 10)
   .[top20species, "species20"] <- as.character(.[top20species, "pathway"])
   .[, 3] <- gsub("s__", " ", .[, 3])
   .[, 3] <- gsub("_", " ", .[, 3])
   .[, 3] <- gsub("[]]|[[]", "",  .[, 3])
   .[, 3] <- gsub(" sp", " sp.",  .[, 3])
   .[, 3] <- gsub(" sp.", "</i> sp.",  .[, 3])
   .[, 3] <- gsub(" group", "</i> group.",  .[, 3])
   .[, 3] <- ifelse(grepl("Other",.[, 3]),
                    "Other",
                    ifelse(grepl("</i>",  .[, 3]),
                           paste("<i>",  .[, 3], sep = ""),
                           paste("<i>",  .[, 3], "</i>", sep = "")) %>%
                            gsub("s__", "", .) %>%
                            gsub("_", " ", .)
   )
   phyloseq_temp <- subset_samples(phyloseq$phyloseq_path_cpm,
                              sample_type == "BAL") %>%
        subset_samples(.,
                       baylor_other_id != "20220606_Neg") %>%
                transform_sample_counts(., function(x){ifelse(is.na(x), 0, x)})
   tax_table(phyloseq_temp) <- tax_table(.) 
   phyloseq_temp
  } %>%
        my_plot_bar(., fill="species20") + 
        ylab("Relative abundancne") +
        theme_classic(base_size = 11, base_family = "sans") +
        theme(legend.text = element_markdown(), axis.text.x = element_blank()) +
        guides(fill=guide_legend(title="Top 10 species")) +
        scale_fill_manual(values = c(RColorBrewer::brewer.pal(n = 11, name = "Paired"))) +
        facet_wrap (~ treatment, scales= "free_x", nrow = 1) +
        #facet_wrap (~ factor(sample_type, levels = c("Mock", "BAL", "Nasal", "Sputum")),
        #            scales= "free_x", nrow=2) +
        labs(tag="A") + 
        ggtitle("BAL")



d <- tax_table(subset_samples(phyloseq$phyloseq_path_cpm,
                              sample_type == "Nasal") %>%
        subset_samples(.,
                       baylor_other_id != "20220606_Neg") %>%
                transform_sample_counts(., function(x){ifelse(is.na(x), 0, x)})
        ) %>%
        cbind(species20 = "[Other]") %>%
        {top20species <- head(taxa_sums(subset_samples(phyloseq$phyloseq_path_cpm,
                              sample_type == "Nasal") %>%
        subset_samples(.,
                       baylor_other_id != "20220606_Neg") %>%
                transform_sample_counts(., function(x){ifelse(is.na(x), 0, x)})) %>%
                                data.frame %>%
                                arrange(-.) %>%
                                row.names(), 10)
   .[top20species, "species20"] <- as.character(.[top20species, "pathway"])
   .[, 3] <- gsub("s__", " ", .[, 3])
   .[, 3] <- gsub("_", " ", .[, 3])
   .[, 3] <- gsub("[]]|[[]", "",  .[, 3])
   .[, 3] <- gsub(" sp", " sp.",  .[, 3])
   .[, 3] <- gsub(" sp.", "</i> sp.",  .[, 3])
   .[, 3] <- gsub(" group", "</i> group.",  .[, 3])
   .[, 3] <- ifelse(grepl("Other",.[, 3]),
                    "Other",
                    ifelse(grepl("</i>",  .[, 3]),
                           paste("<i>",  .[, 3], sep = ""),
                           paste("<i>",  .[, 3], "</i>", sep = "")) %>%
                            gsub("s__", "", .) %>%
                            gsub("_", " ", .)
   )
   phyloseq_temp <- subset_samples(phyloseq$phyloseq_path_cpm,
                              sample_type == "Nasal") %>%
        subset_samples(.,
                       baylor_other_id != "20220606_Neg") %>%
                transform_sample_counts(., function(x){ifelse(is.na(x), 0, x)})
   tax_table(phyloseq_temp) <- tax_table(.) 
   phyloseq_temp
  } %>%
        my_plot_bar(., fill="species20") + 
        ylab("Relative abundancne") +
        theme_classic(base_size = 11, base_family = "sans") +
        theme(legend.text = element_markdown(), axis.text.x = element_blank()) +
        guides(fill=guide_legend(title="Top 10 species")) +
        scale_fill_manual(values = c(RColorBrewer::brewer.pal(n = 11, name = "Paired"))) +
        facet_wrap (~ treatment, scales= "free_x", nrow = 1) +
        #facet_wrap (~ factor(sample_type, levels = c("Mock", "BAL", "Nasal", "Sputum")),
        #            scales= "free_x", nrow=2) +
        labs(tag="B") + 
        ggtitle("Nasal swabs")



e <- tax_table(subset_samples(phyloseq$phyloseq_path_cpm,
                              sample_type == "Sputum") %>%
        subset_samples(.,
                       baylor_other_id != "20220606_Neg") %>%
                transform_sample_counts(., function(x){ifelse(is.na(x), 0, x)})) %>%
        cbind(species20 = "[Other]") %>%
        {top20species <- head(taxa_sums(subset_samples(phyloseq$phyloseq_path_cpm,
                              sample_type == "Sputum") %>%
        subset_samples(.,
                       baylor_other_id != "20220606_Neg") %>%
                transform_sample_counts(., function(x){ifelse(is.na(x), 0, x)})) %>%
                                data.frame %>%
                                arrange(-.) %>%
                                row.names(), 10)
   .[top20species, "species20"] <- as.character(.[top20species, "pathway"])
   .[, 3] <- gsub("s__", " ", .[, 3])
   .[, 3] <- gsub("_", " ", .[, 3])
   .[, 3] <- gsub("[]]|[[]", "",  .[, 3])
   .[, 3] <- gsub(" sp", " sp.",  .[, 3])
   .[, 3] <- gsub(" sp.", "</i> sp.",  .[, 3])
   .[, 3] <- gsub(" group", "</i> group.",  .[, 3])
   .[, 3] <- ifelse(grepl("Other",.[, 3]),
                    "Other",
                    ifelse(grepl("</i>",  .[, 3]),
                           paste("<i>",  .[, 3], sep = ""),
                           paste("<i>",  .[, 3], "</i>", sep = "")) %>%
                            gsub("s__", "", .) %>%
                            gsub("_", " ", .)
   )
   phyloseq_temp <- subset_samples(phyloseq$phyloseq_path_cpm,
                              sample_type == "Neg.") %>%
        subset_samples(.,
                       baylor_other_id != "20220606_Neg") %>%
                transform_sample_counts(., function(x){ifelse(is.na(x), 0, x)})
   tax_table(phyloseq_temp) <- tax_table(.) 
   phyloseq_temp
  } %>%
        my_plot_bar(., fill="species20") + 
        ylab("Relative abundancne") +
        theme_classic(base_size = 11, base_family = "sans") +
        theme(legend.text = element_markdown(), axis.text.x = element_blank()) +
        guides(fill=guide_legend(title="Top 10 species")) +
        scale_fill_manual(values = c(RColorBrewer::brewer.pal(n = 11, name = "Paired"))) +
        facet_wrap (~ treatment, scales= "free_x", nrow = 1) +
        #facet_wrap (~ factor(sample_type, levels = c("Mock", "BAL", "Nasal", "Sputum")),
        #            scales= "free_x", nrow=2) +
        labs(tag="C") + 
        ggtitle("Sputum")


a

b

c

d

e

v. Were there any bias in Mock community?

Gram-stain analysis

Fig. S6. Bar plot of gram-stain

Fig. S6. Bar plot annotated with gram-stain information of (A) BAL, (B) nasal swabs, (C) sputum, and (D) mock communities after each treatment.

barplot_mock_gramproportion <- tax_table(subset_samples(phyloseq$phyloseq_rel,
                              sample_type == "Mock")) %>%
        cbind(species20 = "[Other]") %>%
        {top20species <- head(taxa_sums(subset_samples(phyloseq$phyloseq_rel,
                              sample_type == "Mock") %>% 
           subset_samples(., S.obs != 0)) %>%
                                data.frame %>%
                                arrange(-.) %>%
                                row.names(), 10)
   .[top20species, "species20"] <- as.character(.[top20species, "Species"])
   .[, 9] <- gsub("s__", " ", .[, 9])
   .[, 9] <- gsub("_", " ", .[, 9])
   .[, 9] <- gsub("[]]|[[]", "",  .[, 9])
   .[, 9] <- gsub(" sp", " sp.",  .[, 9])
   .[, 9] <- gsub(" sp.", "</i> sp.",  .[, 9])
   .[, 9] <- gsub(" group", "</i> group.",  .[, 9])
   .[, 9] <- ifelse(grepl("Other",.[, 9]),
                    "Other",
                    ifelse(grepl("</i>",  .[, 9]),
                           paste("<i>",  .[, 9], sep = ""),
                           paste("<i>",  .[, 9], "</i>", sep = "")) %>%
                            gsub("s__", "", .) %>%
                            gsub("_", " ", .)
   )
   phyloseq_temp <- subset_samples(phyloseq$phyloseq_rel,
                              sample_type == "Mock") %>% 
           subset_samples(., S.obs != 0)
   tax_table(phyloseq_temp) <- tax_table(.) 
   phyloseq_temp
  } %>%
        my_plot_bar(., fill="gram_stain") + 
        ylab("") +
        xlab("Subject") +
        theme_classic(base_size = 11, base_family = "sans") +
        theme(legend.text = element_markdown(), 
              legend.position = "right",
              axis.title.y = element_blank(),
              axis.text.x = element_blank()) +
        guides(fill=guide_legend(title="Gram-stain")) +
        scale_fill_manual(values = c(RColorBrewer::brewer.pal(n = 11, name = "Paired"))) +
        facet_wrap (~ treatment, scales = "free_x", nrow = 1) +
        labs(tag = "B")
        #facet_wrap (~ factor(sample_type, levels = c("Mock", "BAL", "Nasal", "Sputum")),
        #            scales= "free_x", nrow=2) +


b <- tax_table(subset_samples(phyloseq$phyloseq_rel,
                              sample_type == "Neg.") %>%
        subset_samples(.,
                       baylor_other_id != "20220606_Neg") %>%
                transform_sample_counts(., function(x){ifelse(is.na(x), 0, x)})
        ) %>%
        cbind(species20 = "[Other]") %>%
        {top20species <- head(taxa_sums(subset_samples(phyloseq$phyloseq_rel,
                              sample_type == "Neg.") %>%
                                      subset_samples(.,
                                                     baylor_other_id != "20220606_Neg") %>%
                                      transform_sample_counts(.,
                                                              function(x){ifelse(is.na(x),
                                                                                 0, 
                                                                                 x)})
                              ) %>%
                                data.frame %>%
                                arrange(-.) %>%
                                row.names(), 10)
   .[top20species, "species20"] <- as.character(.[top20species, "Species"])
   .[, 9] <- gsub("s__", " ", .[, 9])
   .[, 9] <- gsub("_", " ", .[, 9])
   .[, 9] <- gsub("[]]|[[]", "",  .[, 9])
   .[, 9] <- gsub(" sp", " sp.",  .[, 9])
   .[, 9] <- gsub(" sp.", "</i> sp.",  .[, 9])
   .[, 9] <- gsub(" group", "</i> group.",  .[, 9])
   .[, 9] <- ifelse(grepl("Other",.[, 9]),
                    "Other",
                    ifelse(grepl("</i>",  .[, 9]),
                           paste("<i>",  .[, 9], sep = ""),
                           paste("<i>",  .[, 9], "</i>", sep = "")) %>%
                            gsub("s__", "", .) %>%
                            gsub("_", " ", .)
   )
   phyloseq_temp <- subset_samples(phyloseq$phyloseq_rel,
                              sample_type == "Neg.") %>%
        subset_samples(.,
                       baylor_other_id != "20220606_Neg") %>%
                transform_sample_counts(., function(x){ifelse(is.na(x), 0, x)})
   tax_table(phyloseq_temp) <- tax_table(.) 
   phyloseq_temp
  } %>% 
        my_plot_bar(., fill="gram_stain") + 
        ylab("") +
        xlab("Subject") +
        theme_classic(base_size = 11, base_family = "sans") +
        theme(legend.text = element_markdown(),
              axis.text.x = element_blank()) +
        guides(fill=guide_legend(title="Gram-stain")) +
        scale_fill_manual(values = c(RColorBrewer::brewer.pal(n = 11, name = "Paired"))) +
        #facet_wrap (~ treatment, scales = "free_x", nrow = 1) +
        facet_wrap ( ~ treatment,
                    scales= "free_x", nrow=1) +
        ggtitle("E  Negative controls")

c <- tax_table(subset_samples(phyloseq$phyloseq_rel,
                              sample_type == "BAL") %>%
                transform_sample_counts(., function(x){ifelse(is.na(x), 0, x)})
        ) %>%
        cbind(species20 = "[Other]") %>%
        {top20species <- head(taxa_sums(subset_samples(phyloseq$phyloseq_rel,
                              sample_type == "BAL") %>%
                                      transform_sample_counts(.,
                                                              function(x){ifelse(is.na(x),
                                                                                 0, 
                                                                                 x)})
                              ) %>%
                                data.frame %>%
                                arrange(-.) %>%
                                row.names(), 10)
   .[top20species, "species20"] <- as.character(.[top20species, "Species"])
   .[, 9] <- gsub("s__", " ", .[, 9])
   .[, 9] <- gsub("_", " ", .[, 9])
   .[, 9] <- gsub("[]]|[[]", "",  .[, 9])
   .[, 9] <- gsub(" sp", " sp.",  .[, 9])
   .[, 9] <- gsub(" sp.", "</i> sp.",  .[, 9])
   .[, 9] <- gsub(" group", "</i> group.",  .[, 9])
   .[, 9] <- ifelse(grepl("Other",.[, 9]),
                    "Other",
                    ifelse(grepl("</i>",  .[, 9]),
                           paste("<i>",  .[, 9], sep = ""),
                           paste("<i>",  .[, 9], "</i>", sep = "")) %>%
                            gsub("s__", "", .) %>%
                            gsub("_", " ", .)
   )
   phyloseq_temp <- subset_samples(phyloseq$phyloseq_rel,
                              sample_type == "BAL") %>%
           transform_sample_counts(., function(x){ifelse(is.na(x), 0, x)})
   tax_table(phyloseq_temp) <- tax_table(.) 
   sample_data(phyloseq_temp)$subject_id <- sample_data(phyloseq_temp)$subject_id %>% 
           factor(labels = c("A", "B", "C", "D", "E"))
   phyloseq_temp
  } %>% 
        my_plot_bar(., x = "treatment", fill="gram_stain") + 
        ylab("") +
        theme_classic(base_size = 11, base_family = "sans") +
        theme(legend.text = element_markdown(),
              axis.title.y = element_blank(),
              axis.text.x = element_text(angle = 45, hjust=1, size = 7,
                                         color = c("#e31a1c", "#fb9a99", "#33a02c", "#b2df8a", "#1f78b4", "#a6cee3")),

              axis.title.x = element_blank()) +
        guides(fill=guide_legend(title="Gram-stain")) +
        scale_fill_manual(values = c(RColorBrewer::brewer.pal(n = 11, name = "Paired"))) +
        #facet_wrap (~ treatment, scales = "free_x", nrow = 1) +
        facet_wrap ( ~ subject_id,
                    scales= "free_x", nrow=1) +
        ggtitle("A  BAL")


d <- tax_table(subset_samples(phyloseq$phyloseq_rel,
                              sample_type == "Nasal") %>% 
           subset_samples(., S.obs != 0)) %>%
        cbind(species20 = "[Other]") %>%
        {top20species <- head(taxa_sums(subset_samples(phyloseq$phyloseq_rel,
                              sample_type == "Nasal") %>% 
           subset_samples(., S.obs != 0)) %>%
                                data.frame %>%
                                arrange(-.) %>%
                                row.names(), 10)
   .[top20species, "species20"] <- as.character(.[top20species, "Species"])
   .[, 9] <- gsub("s__", " ", .[, 9])
   .[, 9] <- gsub("_", " ", .[, 9])
   .[, 9] <- gsub("[]]|[[]", "",  .[, 9])
   .[, 9] <- gsub(" sp", " sp.",  .[, 9])
   .[, 9] <- gsub(" sp.", "</i> sp.",  .[, 9])
   .[, 9] <- gsub(" group", "</i> group.",  .[, 9])
   .[, 9] <- ifelse(grepl("Other",.[, 9]),
                    "Other",
                    ifelse(grepl("</i>",  .[, 9]),
                           paste("<i>",  .[, 9], sep = ""),
                           paste("<i>",  .[, 9], "</i>", sep = "")) %>%
                            gsub("s__", "", .) %>%
                            gsub("_", " ", .)
   )
   phyloseq_temp <- subset_samples(phyloseq$phyloseq_rel,
                              sample_type == "Nasal") %>% 
           subset_samples(., S.obs != 0)
   tax_table(phyloseq_temp) <- tax_table(.) 
   sample_data(phyloseq_temp)$subject_id <- sample_data(phyloseq_temp)$subject_id %>% 
           factor(labels = c("A", "B", "C", "D", "E", "F", "G", "H", "I", "J")) %>% 
           as.character()
   sample_data(phyloseq_temp)$treatment_order <- sample_data(phyloseq_temp)$treatment %>% 
           as.numeric() %>%
           as.character()
   phyloseq_temp
  } %>% 
        my_plot_bar(., x = "treatment_order", fill="gram_stain") + 
        theme_classic(base_size = 11, base_family = "sans") +
        theme(legend.position = "none",
              axis.title.x = element_blank(),
              axis.text.x = element_text(angle = 45, hjust=1, size = 7,
                                         color = c("#e31a1c", "#fb9a99", "#33a02c", "#b2df8a", "#1f78b4", "#a6cee3")),
              axis.title.y = element_blank()) +
        guides(fill=guide_legend(title="Gram-stain")) +
        scale_fill_manual(values = c(RColorBrewer::brewer.pal(n = 11, name = "Paired"))) +
        scale_x_discrete(breaks = c("1", "2", "3", "4", "5", "6"),
                         labels = c("Untreated", "lyPMA", "Benzonase", "HostZero", "MolYsis", "QIAamp")) +
        ggtitle("B  Nasal swabs")

d <- ggplotGrob(d + facet_grid(~ subject_id, drop = T, scales = "free", space = "free"))



d$grobs[[22]]$children[[2]]$grobs[[2]]$children[[1]]$label
## [1] "Untreated" "lyPMA"     "QIAamp"
d$grobs[[23]]$children[[2]]$grobs[[2]]$children[[1]]$label
## [1] "Untreated" "lyPMA"     "QIAamp"
d$grobs[[22]]$children[[2]]$grobs[[2]]$children[[1]]$gp$col <- c("#e31a1c", "#fb9a99", "#a6cee3")
d$grobs[[23]]$children[[2]]$grobs[[2]]$children[[1]]$gp$col <- c("#e31a1c", "#fb9a99", "#a6cee3")

d$grobs[[24]]$children[[2]]$grobs[[2]]$children[[1]]$label
## [1] "Untreated" "Benzonase" "HostZero"  "MolYsis"
d$grobs[[25]]$children[[2]]$grobs[[2]]$children[[1]]$label
## [1] "Untreated" "Benzonase" "HostZero"  "MolYsis"
d$grobs[[26]]$children[[2]]$grobs[[2]]$children[[1]]$label
## [1] "Untreated" "Benzonase" "HostZero"  "MolYsis"
d$grobs[[27]]$children[[2]]$grobs[[2]]$children[[1]]$label
## [1] "Untreated" "Benzonase" "HostZero"  "MolYsis"
d$grobs[[24]]$children[[2]]$grobs[[2]]$children[[1]]$gp$col <- c("#e31a1c", "#33a02c", "#b2df8a", "#1f78b4")
d$grobs[[25]]$children[[2]]$grobs[[2]]$children[[1]]$gp$col <- c("#e31a1c", "#33a02c", "#b2df8a", "#1f78b4")
d$grobs[[26]]$children[[2]]$grobs[[2]]$children[[1]]$gp$col <- c("#e31a1c", "#33a02c", "#b2df8a", "#1f78b4")
d$grobs[[27]]$children[[2]]$grobs[[2]]$children[[1]]$gp$col <- c("#e31a1c", "#33a02c", "#b2df8a", "#1f78b4")



d$grobs[[28]]$children[[2]]$grobs[[2]]$children[[1]]$label
## [1] "Untreated" "lyPMA"     "Benzonase" "HostZero"
d$grobs[[28]]$children[[2]]$grobs[[2]]$children[[1]]$gp$col <- c("#e31a1c","#fb9a99", "#33a02c", "#b2df8a")

d$grobs[[29]]$children[[2]]$grobs[[2]]$children[[1]]$label
## [1] "Untreated" "MolYsis"   "QIAamp"
d$grobs[[29]]$children[[2]]$grobs[[2]]$children[[1]]$gp$col <- c("#e31a1c", "#1f78b4", "#a6cee3")

d$grobs[[30]]$children[[2]]$grobs[[2]]$children[[1]]$label
## [1] "Untreated" "lyPMA"     "QIAamp"
d$grobs[[31]]$children[[2]]$grobs[[2]]$children[[1]]$label
## [1] "Untreated" "lyPMA"     "QIAamp"
d$grobs[[30]]$children[[2]]$grobs[[2]]$children[[1]]$gp$col <- c("#e31a1c", "#fb9a99", "#a6cee3")
d$grobs[[31]]$children[[2]]$grobs[[2]]$children[[1]]$gp$col <- c("#e31a1c", "#fb9a99", "#a6cee3")




e <- tax_table(subset_samples(phyloseq$phyloseq_rel,
                              sample_type == "Sputum") %>% 
           subset_samples(., S.obs != 0)) %>%
        cbind(species20 = "[Other]") %>%
        {top20species <- head(taxa_sums(subset_samples(phyloseq$phyloseq_rel,
                              sample_type == "Sputum") %>% 
           subset_samples(., S.obs != 0)) %>%
                                data.frame %>%
                                arrange(-.) %>%
                                row.names(), 10)
   .[top20species, "species20"] <- as.character(.[top20species, "Species"])
   .[, 9] <- gsub("s__", " ", .[, 9])
   .[, 9] <- gsub("_", " ", .[, 9])
   .[, 9] <- gsub("[]]|[[]", "",  .[, 9])
   .[, 9] <- gsub(" sp", " sp.",  .[, 9])
   .[, 9] <- gsub(" sp.", "</i> sp.",  .[, 9])
   .[, 9] <- gsub(" group", "</i> group.",  .[, 9])
   .[, 9] <- ifelse(grepl("Other",.[, 9]),
                    "Other",
                    ifelse(grepl("</i>",  .[, 9]),
                           paste("<i>",  .[, 9], sep = ""),
                           paste("<i>",  .[, 9], "</i>", sep = "")) %>%
                            gsub("s__", "", .) %>%
                            gsub("_", " ", .)
   )
   phyloseq_temp <- subset_samples(phyloseq$phyloseq_rel,
                              sample_type == "Sputum") %>% 
           subset_samples(., S.obs != 0)
   tax_table(phyloseq_temp) <- tax_table(.) 
   sample_data(phyloseq_temp)$subject_id <- sample_data(phyloseq_temp)$subject_id %>% 
           factor(labels = c("A", "B", "C", "D", "E"))
   phyloseq_temp
  } %>%
        my_plot_bar(., x = "treatment", fill="gram_stain")+
        theme_classic(base_size = 11, base_family = "sans") +
        theme(legend.text = element_markdown(),
              axis.title.y = element_blank(),
              axis.text.x = element_text(angle = 45, hjust=1, size = 7, 
                                         color = c("#e31a1c", "#fb9a99", "#33a02c", "#b2df8a", "#1f78b4", "#a6cee3")),

              axis.title.x = element_blank()) +
        guides(fill=guide_legend(title="Gram-stain")) +
        #scale_x_discrete(drop=F) +
        scale_fill_manual(values = c(RColorBrewer::brewer.pal(n = 11, name = "Paired"))) +
        facet_wrap ( ~ subject_id,
                    scales= "free_x", nrow=1) +
        #facet_wrap (~ factor(sample_type, levels = c("Mock", "BAL", "Nasal", "Sputum")),
        #            scales= "free_x", nrow=2) +
        ggtitle("C  Sputum")



figS6 <- ggarrange(c, d %>% as_ggplot, e, ncol = 1, common.legend = T, legend = "top")



annotate_figure(figS6,
                left = text_grob("Relative abundance",
                                 rot = 90,
                                 family = "sans", 
                                 size = 11),
                bottom = text_grob("Treatment",
                                 rot = 0,
                                 family = "sans", 
                                 size = 11)
)

png(file = "Project_SICAS2_microbiome/7_Manuscripts/2022_MGK_Host_Depletion/Figures/FigureS6_updated.png",   # The directory you want to save the file in
    width = 180, # The width of the plot in inches
    height = 200, # The height of the plot in inches
    units = "mm",
    res = 600
) #fixing multiple page issue


annotate_figure(figS6,
                left = text_grob("Relative abundance",
                                 rot = 90,
                                 family = "sans", 
                                 size = 11),
                bottom = text_grob("Treatment",
                                 rot = 0,
                                 family = "sans", 
                                 size = 11)
)


# alpha diversity plots
#ggarrange(f4ad, ggarrange(f4e, f4f, ncol = 2),
#          ncol = 1) # alpha diversity plots

dev.off()
## quartz_off_screen 
##                 2

Gram-stain stats (all samples)

Effect size, standard error (SE) and p-value at a statistical test on gram-negative proportion using linear mixed effect model. lmer( Gram-negative proportion vs sample_type + treatment + sample_type * treatment + (1|subject_id) )

Interaction term was significant (p-value = 0.018771)

Raw results

lmer_gram_stain_proportion <- lmer(gram_neg_prop ~ sample_type + treatment + sample_type * treatment + (1|subject_id),
     data = sample_data %>% data.frame %>% subset(., .$sample_type %in% c("BAL", "Nasal", "Sputum")))

lmer_gram_stain_proportion %>% summary
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: gram_neg_prop ~ sample_type + treatment + sample_type * treatment +  
##     (1 | subject_id)
##    Data: sample_data %>% data.frame %>% subset(., .$sample_type %in% c("BAL",  
##     "Nasal", "Sputum"))
## 
## REML criterion at convergence: 699.8
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -3.2364 -0.2334 -0.0046  0.2513  2.4769 
## 
## Random effects:
##  Groups     Name        Variance Std.Dev.
##  subject_id (Intercept) 309.8    17.60   
##  Residual               231.6    15.22   
## Number of obs: 95, groups:  subject_id, 20
## 
## Fixed effects:
##                                      Estimate Std. Error      df t value
## (Intercept)                            32.848     10.406  36.974   3.157
## sample_typeNasal                      -32.005     12.744  36.974  -2.511
## sample_typeSputum                      28.794     14.716  36.974   1.957
## treatmentlyPMA                          6.416      9.624  61.541   0.667
## treatmentBenzonase                     15.332      9.624  61.541   1.593
## treatmentHostZERO                       8.609      9.624  61.541   0.895
## treatmentMolYsis                        3.372      9.624  61.541   0.350
## treatmentQIAamp                         7.306      9.624  61.541   0.759
## sample_typeNasal:treatmentlyPMA        13.747     13.087  63.733   1.050
## sample_typeSputum:treatmentlyPMA      -47.305     13.611  61.541  -3.476
## sample_typeNasal:treatmentBenzonase   -14.674     13.117  64.032  -1.119
## sample_typeSputum:treatmentBenzonase  -67.812     13.611  61.541  -4.982
## sample_typeNasal:treatmentHostZERO     -9.817     13.117  64.032  -0.748
## sample_typeSputum:treatmentHostZERO   -68.467     13.611  61.541  -5.030
## sample_typeNasal:treatmentMolYsis      -1.847     13.087  63.733  -0.141
## sample_typeSputum:treatmentMolYsis    -63.239     13.611  61.541  -4.646
## sample_typeNasal:treatmentQIAamp       -5.940     13.117  64.032  -0.453
## sample_typeSputum:treatmentQIAamp     -67.898     13.611  61.541  -4.989
##                                      Pr(>|t|)    
## (Intercept)                           0.00317 ** 
## sample_typeNasal                      0.01653 *  
## sample_typeSputum                     0.05797 .  
## treatmentlyPMA                        0.50747    
## treatmentBenzonase                    0.11627    
## treatmentHostZERO                     0.37451    
## treatmentMolYsis                      0.72725    
## treatmentQIAamp                       0.45066    
## sample_typeNasal:treatmentlyPMA       0.29749    
## sample_typeSputum:treatmentlyPMA      0.00094 ***
## sample_typeNasal:treatmentBenzonase   0.26746    
## sample_typeSputum:treatmentBenzonase 5.40e-06 ***
## sample_typeNasal:treatmentHostZERO    0.45697    
## sample_typeSputum:treatmentHostZERO  4.52e-06 ***
## sample_typeNasal:treatmentMolYsis     0.88821    
## sample_typeSputum:treatmentMolYsis   1.83e-05 ***
## sample_typeNasal:treatmentQIAamp      0.65221    
## sample_typeSputum:treatmentQIAamp    5.27e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

ANOVA on lmer result

lmer_gram_stain_proportion %>% anova 

Table S5. Gram-stain stats -stratified

Stratified analsyis

Table S5. Effect size, standard error (SE) and p-value at a statistical test on gram-negative proportion using linear mixed effect model. LMER(Gram-negative proportion vs sample type + treatment + sample type * treatment + (1|subject id)). Stratified analyses were conducted for each sample type as an interaction term of sample type and treatment was significant at an ANOVA test (p-value < 0.001) using a model, LMER (Gram-negative proportion ~ sample type + treatment + sample type * treatment + (1|subject_id) ). The baseline of categorical variables is untreated BAL, and statistical significances were noted with *: p-value < 0.05 and ***: p-value < 0.001.

Raw results - Mock

##Mock
gram_neg_prop_mock <- 
        lm(gram_neg_prop ~ treatment,
     data = sample_data(phyloseq$phyloseq_rel) %>% data.frame %>% subset(., .$sample_type %in% c("Mock"))) 

gram_neg_prop_mock %>%
        summary
## 
## Call:
## lm(formula = gram_neg_prop ~ treatment, data = sample_data(phyloseq$phyloseq_rel) %>% 
##     data.frame %>% subset(., .$sample_type %in% c("Mock")))
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -11.2550  -0.0878  -0.0101   0.1161  17.2213 
## 
## Coefficients:
##                    Estimate Std. Error t value Pr(>|t|)    
## (Intercept)          51.131      2.144  23.848  < 2e-16 ***
## treatmentlyPMA      -23.519      3.180  -7.396 9.54e-08 ***
## treatmentBenzonase  -51.082      3.180 -16.063 1.11e-14 ***
## treatmentHostZERO   -51.131      3.180 -16.078 1.08e-14 ***
## treatmentMolYsis    -50.638      3.180 -15.923 1.35e-14 ***
## treatmentQIAamp     -51.052      3.180 -16.054 1.12e-14 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 5.252 on 25 degrees of freedom
## Multiple R-squared:  0.9497, Adjusted R-squared:  0.9397 
## F-statistic: 94.46 on 5 and 25 DF,  p-value: 2.085e-15

BAL

##BAL
gram_neg_prop_bal <- 
lmer(gram_neg_prop ~ treatment + (1|subject_id),
     data = sample_data %>% data.frame %>% subset(., .$sample_type %in% c("BAL"))) 


gram_neg_prop_bal %>%
        summary
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: gram_neg_prop ~ treatment + (1 | subject_id)
##    Data: sample_data %>% data.frame %>% subset(., .$sample_type %in% c("BAL"))
## 
## REML criterion at convergence: 240.8
## 
## Scaled residuals: 
##      Min       1Q   Median       3Q      Max 
## -2.07050 -0.39650  0.00736  0.38972  1.58848 
## 
## Random effects:
##  Groups     Name        Variance Std.Dev.
##  subject_id (Intercept) 1222.6   34.97   
##  Residual                575.6   23.99   
## Number of obs: 30, groups:  subject_id, 5
## 
## Fixed effects:
##                    Estimate Std. Error     df t value Pr(>|t|)
## (Intercept)          32.848     18.964  7.248   1.732    0.125
## treatmentlyPMA        6.416     15.174 20.000   0.423    0.677
## treatmentBenzonase   15.332     15.174 20.000   1.010    0.324
## treatmentHostZERO     8.609     15.174 20.000   0.567    0.577
## treatmentMolYsis      3.372     15.174 20.000   0.222    0.826
## treatmentQIAamp       7.306     15.174 20.000   0.481    0.635
## 
## Correlation of Fixed Effects:
##             (Intr) trtPMA trtmnB tHZERO trtmMY
## trtmntlyPMA -0.400                            
## trtmntBnzns -0.400  0.500                     
## trtmntHZERO -0.400  0.500  0.500              
## trtmntMlYss -0.400  0.500  0.500  0.500       
## trtmntQIAmp -0.400  0.500  0.500  0.500  0.500

Nasal

##Nasal
gram_neg_prop_ns <- 
lmer(gram_neg_prop ~ treatment + (1|subject_id),
     data = sample_data %>% data.frame %>% subset(., .$sample_type %in% c("Nasal"))) 


gram_neg_prop_ns %>%
        summary
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: gram_neg_prop ~ treatment + (1 | subject_id)
##    Data: 
## sample_data %>% data.frame %>% subset(., .$sample_type %in% c("Nasal"))
## 
## REML criterion at convergence: 191.3
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -3.1779 -0.2404 -0.0354  0.2250  2.9273 
## 
## Random effects:
##  Groups     Name        Variance Std.Dev.
##  subject_id (Intercept)  5.174   2.275   
##  Residual               25.784   5.078   
## Number of obs: 35, groups:  subject_id, 10
## 
## Fixed effects:
##                    Estimate Std. Error       df t value Pr(>|t|)    
## (Intercept)         0.84374    1.75949 26.89687   0.480    0.635    
## treatmentlyPMA     19.41142    2.84800 23.04516   6.816  5.9e-07 ***
## treatmentBenzonase  1.89039    2.85065 23.31583   0.663    0.514    
## treatmentHostZERO   0.02514    2.85065 23.31583   0.009    0.993    
## treatmentMolYsis    2.27684    2.84800 23.04516   0.799    0.432    
## treatmentQIAamp     0.13386    2.85065 23.31583   0.047    0.963    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr) trtPMA trtmnB tHZERO trtmMY
## trtmntlyPMA -0.515                            
## trtmntBnzns -0.514  0.286                     
## trtmntHZERO -0.514  0.286  0.365              
## trtmntMlYss -0.515  0.272  0.349  0.349       
## trtmntQIAmp -0.514  0.349  0.269  0.269  0.286

Sputum

##Sputum
gram_neg_prop_spt <- 
lmer(gram_neg_prop ~ treatment + (1|subject_id),
     data = sample_data %>% data.frame %>% subset(., .$sample_type %in% c("Sputum"))) 

gram_neg_prop_spt %>%
        summary
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: gram_neg_prop ~ treatment + (1 | subject_id)
##    Data: 
## sample_data %>% data.frame %>% subset(., .$sample_type %in% c("Sputum"))
## 
## REML criterion at convergence: 194
## 
## Scaled residuals: 
##      Min       1Q   Median       3Q      Max 
## -2.13570 -0.23729  0.08356  0.34246  2.24179 
## 
## Random effects:
##  Groups     Name        Variance Std.Dev.
##  subject_id (Intercept)  39.45    6.281  
##  Residual               103.99   10.198  
## Number of obs: 30, groups:  subject_id, 5
## 
## Fixed effects:
##                    Estimate Std. Error      df t value Pr(>|t|)    
## (Intercept)          61.642      5.356  17.414  11.509 1.44e-09 ***
## treatmentlyPMA      -40.888      6.450  20.000  -6.340 3.46e-06 ***
## treatmentBenzonase  -52.480      6.450  20.000  -8.137 8.96e-08 ***
## treatmentHostZERO   -59.857      6.450  20.000  -9.281 1.09e-08 ***
## treatmentMolYsis    -59.867      6.450  20.000  -9.282 1.09e-08 ***
## treatmentQIAamp     -60.592      6.450  20.000  -9.395 8.94e-09 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr) trtPMA trtmnB tHZERO trtmMY
## trtmntlyPMA -0.602                            
## trtmntBnzns -0.602  0.500                     
## trtmntHZERO -0.602  0.500  0.500              
## trtmntMlYss -0.602  0.500  0.500  0.500       
## trtmntQIAmp -0.602  0.500  0.500  0.500  0.500

Tidy summarized table

gram_neg_prop_mock_kbl <- gram_neg_prop_mock %>%
        summary() %>%
        .$coefficients %>%
        data.frame(check.names = F) %>% 
        mutate(` ` = case_when(abs(`Pr(>|t|)`) < 0.001 ~ "***",
                               abs(`Pr(>|t|)`) < 0.01 ~ "**",
                               abs(`Pr(>|t|)`) < 0.05 ~ "*",
                               .default = " ")) %>% 
        rename("<i>p</i>-value" = "Pr(>|t|)",
               t_value = "t value") %>%
        merge(., 
              gram_neg_prop_mock %>% confint() %>%
                        round(digits = 1) %>% 
                        format(nsmall = 1) %>%
                        as.data.frame() %>%
                        mutate(conf = paste("(",
                                            gsub(" ","", .[,1]),
                                            ", ",
                                            gsub(" ","", .[,2]),
                                            ")",
                                            sep = "")),
              by = 0
              ) %>%
        column_to_rownames("Row.names") %>%
        rownames_to_column(var = "x") %>% mutate(x = gsub("treatment|sample_type", "", x)) %>% mutate(x = gsub(":", " * ", x)) %>%
        
        mutate(x = gsub("bal_log_centered_final_reads", "log<sub>10</sub>(Final reads)", x)) %>%
        column_to_rownames(var = "x") %>% 
        mutate("Effect size (95% CI)" = 
                       paste(round(Estimate, digits = 1) %>%
                                     format(nsmall = 1),
                             conf, 
                             sep = " "),
               "<i>p</i>-value" = round(`<i>p</i>-value`, 3)) %>%
        dplyr::select(c("Effect size (95% CI)", "<i>p</i>-value", " ")) %>%
        .[c("(Intercept)",
                  "lyPMA",
                  "Benzonase",
                  "HostZERO",
                  "MolYsis",
                  "QIAamp"),]

gram_neg_prop_ns_kbl  <- gram_neg_prop_ns %>%
        summary() %>%
        .$coefficients %>%
        data.frame(check.names = F) %>% 
        mutate(` ` = case_when(abs(`Pr(>|t|)`) < 0.001 ~ "***",
                               abs(`Pr(>|t|)`) < 0.01 ~ "**",
                               abs(`Pr(>|t|)`) < 0.05 ~ "*",
                               .default = " ")) %>% 
        rename("<i>p</i>-value" = "Pr(>|t|)",
               t_value = "t value") %>%
        merge(., 
              gram_neg_prop_ns %>% confint() %>%
                        round(digits = 1) %>% 
                        format(nsmall = 1) %>%
                        as.data.frame() %>%
                        mutate(conf = paste("(",
                                            gsub(" ","", .[,1]),
                                            ", ",
                                            gsub(" ","", .[,2]),
                                            ")",
                                            sep = "")),
              by = 0
              ) %>%
        column_to_rownames("Row.names") %>%
        rownames_to_column(var = "x") %>% mutate(x = gsub("treatment|sample_type", "", x)) %>% mutate(x = gsub(":", " * ", x)) %>%
        
        mutate(x = gsub("bal_log_centered_final_reads", "log<sub>10</sub>(Final reads)", x)) %>%
        column_to_rownames(var = "x") %>% 
        mutate("Effect size (95% CI)" = 
                       paste(round(Estimate, digits = 1) %>%
                                     format(nsmall = 1),
                             conf, 
                             sep = " "),
               "<i>p</i>-value" = round(`<i>p</i>-value`, 3)) %>%
        dplyr::select(c("Effect size (95% CI)", "<i>p</i>-value", " ")) %>%
        .[c("(Intercept)",
                  "lyPMA",
                  "Benzonase",
                  "HostZERO",
                  "MolYsis",
                  "QIAamp"),]


gram_neg_prop_spt_kbl <- gram_neg_prop_spt %>%
        summary() %>%
        .$coefficients %>%
        data.frame(check.names = F) %>% 
        mutate(` ` = case_when(abs(`Pr(>|t|)`) < 0.001 ~ "***",
                               abs(`Pr(>|t|)`) < 0.01 ~ "**",
                               abs(`Pr(>|t|)`) < 0.05 ~ "*",
                               .default = " ")) %>% 
        rename("<i>p</i>-value" = "Pr(>|t|)",
               t_value = "t value") %>%
        merge(., 
              gram_neg_prop_spt %>% confint() %>%
                        round(digits = 1) %>% 
                        format(nsmall = 1) %>%
                        as.data.frame() %>%
                        mutate(conf = paste("(",
                                            gsub(" ","", .[,1]),
                                            ", ",
                                            gsub(" ","", .[,2]),
                                            ")",
                                            sep = "")),
              by = 0
              ) %>%
        column_to_rownames("Row.names") %>%
        rownames_to_column(var = "x") %>% mutate(x = gsub("treatment|sample_type", "", x)) %>% mutate(x = gsub(":", " * ", x)) %>%
        
        mutate(x = gsub("bal_log_centered_final_reads", "log<sub>10</sub>(Final reads)", x)) %>%
        column_to_rownames(var = "x") %>% 
        mutate("Effect size (95% CI)" = 
                       paste(round(Estimate, digits = 1) %>%
                                     format(nsmall = 1),
                             conf, 
                             sep = " "),
               "<i>p</i>-value" = round(`<i>p</i>-value`, 3)) %>%
        dplyr::select(c("Effect size (95% CI)", "<i>p</i>-value", " ")) %>%
        .[c("(Intercept)",
                  "lyPMA",
                  "Benzonase",
                  "HostZERO",
                  "MolYsis",
                  "QIAamp"),]



gram_neg_prop_bal_kbl <- gram_neg_prop_bal %>%
        summary() %>%
        .$coefficients %>%
        data.frame(check.names = F) %>% 
        mutate(` ` = case_when(abs(`Pr(>|t|)`) < 0.001 ~ "***",
                               abs(`Pr(>|t|)`) < 0.01 ~ "**",
                               abs(`Pr(>|t|)`) < 0.05 ~ "*",
                               .default = " ")) %>% 
        rename("<i>p</i>-value" = "Pr(>|t|)",
               t_value = "t value") %>%
        merge(., 
              gram_neg_prop_bal %>% confint() %>%
                        round(digits = 1) %>% 
                        format(nsmall = 1) %>%
                        as.data.frame() %>%
                        mutate(conf = paste("(",
                                            gsub(" ","", .[,1]),
                                            ", ",
                                            gsub(" ","", .[,2]),
                                            ")",
                                            sep = "")),
              by = 0
              ) %>%
        column_to_rownames("Row.names") %>%
        rownames_to_column(var = "x") %>% mutate(x = gsub("treatment|sample_type", "", x)) %>% mutate(x = gsub(":", " * ", x)) %>%
        
        mutate(x = gsub("bal_log_centered_final_reads", "log<sub>10</sub>(Final reads)", x)) %>%
        column_to_rownames(var = "x") %>% 
        mutate("Effect size (95% CI)" = 
                       paste(round(Estimate, digits = 1) %>%
                                     format(nsmall = 1),
                             conf, 
                             sep = " "),
               "<i>p</i>-value" = round(`<i>p</i>-value`, 3)) %>%
        dplyr::select(c("Effect size (95% CI)", "<i>p</i>-value", " ")) %>%
        .[c("(Intercept)",
                  "lyPMA",
                  "Benzonase",
                  "HostZERO",
                  "MolYsis",
                  "QIAamp"),]


tableS5 <- cbind(gram_neg_prop_mock_kbl, gram_neg_prop_bal_kbl, gram_neg_prop_ns_kbl, gram_neg_prop_spt_kbl) %>%
        kbl(format = "html", escape = 0) %>% kable_styling(full_width = 0, html_font = "sans") %>% 
        add_header_above(c(" " = 1, "Mock" = 3, "BAL" = 3, "Nasal swab" = 3, "Sputum" = 3)) %>% 
        kable_styling(full_width = 0, html_font = "sans") 

tableS5
Mock
BAL
Nasal swab
Sputum
Effect size (95% CI) p-value Effect size (95% CI) p-value Effect size (95% CI) p-value Effect size (95% CI) p-value
(Intercept) 51.1 (46.7, 55.5) 0 *** 32.8 (-5.1, 70.7) 0.125 0.8 (-2.4, 4.1) 0.635 61.6 (51.8, 71.5) 0 ***
lyPMA -23.5 (-30.1, -17.0) 0 *** 6.4 (-21.2, 34.1) 0.677 19.4 (14.1, 24.6) 0.000 *** -40.9 (-52.6, -29.1) 0 ***
Benzonase -51.1 (-57.6, -44.5) 0 *** 15.3 (-12.3, 43.0) 0.324 1.9 (-3.4, 7.5) 0.514 -52.5 (-64.2, -40.7) 0 ***
HostZERO -51.1 (-57.7, -44.6) 0 *** 8.6 (-19.0, 36.3) 0.577 0.0 (-5.3, 5.6) 0.993 -59.9 (-71.6, -48.1) 0 ***
MolYsis -50.6 (-57.2, -44.1) 0 *** 3.4 (-24.3, 31.0) 0.826 2.3 (-3.0, 7.6) 0.432 -59.9 (-71.6, -48.1) 0 ***
QIAamp -51.1 (-57.6, -44.5) 0 *** 7.3 (-20.4, 35.0) 0.635 0.1 (-5.4, 5.5) 0.963 -60.6 (-72.3, -48.8) 0 ***
save_kable(tableS5, file = "Project_SICAS2_microbiome/7_Manuscripts/2022_MGK_Host_Depletion/Figures/tableS5.html", self_contained = T)

*vi. Did diversity matrices change?

Taxa change

Fig. 3 Alpha and beta diversity

Fig. 3. Alpha and beta diversity by sample type and treatment method after removing potential contaminants and rare taxa. (A) Species richness with statistical test results (linear mixed effect model stratified by sample type), (B) Morisita-Horn dissimilarity within subject between treatment, representing squares for median value and bars for 95% confidence intervals.

  • PSL comments (20230630): - Figure 3A: it is hard to see some of the boxplot colors due to the narrow interquartile range. Can you use stat_summary and geom = “pointrange” like what Maghini DG et al did for their Figure 3b or create dotplot + pointrange geom like their Figure 3c? I found some example code goodgling though you might want to show median + iqr rather than mean + sd stat_summary(fun = mean, geom = “pointrange”, fun.max = function(x) mean(x) + sd(x), fun.min = function(x) mean(x) - sd(x))
    • Figure 3A: we also need to talk because the mock community is only supposed to have a species richness = 10. I know that some of this may be due to taxonomic misclassification but that immediately will raise issues with the reviewer so we need to anticipate how to address this in advance
    • Figure 3A: add relevant p-values using horizontal bars and stars for significance like Maghini DG et al did for Figure 3B
    • Figure 3A: why not just do what Maghini DG et al did for Figure 3e? But have one panel per sample type
f3a <- ggplot(subset(sample_data(phyloseq$phyloseq_count) %>% 
                             data.frame, sample_data(phyloseq$phyloseq_count)$sample_type %in% c(#"Mock",
                                     "Sputum", "Nasal", "BAL")), aes(x = treatment, y = S.obs)) +
        geom_jitter(aes(color = treatment),
                    position = position_jitter(0.2), size = 1.2, alpha = 0.3, stroke = 0) +
        stat_summary(aes(color = treatment),
                             fun.data="mean_sdl",  fun.args = list(mult=1), 
                             geom = "pointrange",  size = 0.4) +
        #stat_summary(fun = mean,
        #       geom = "pointrange",
        #       fun.max = function(x) mean(x) + sd(x),
        #       fun.min = function(x) mean(x) - sd(x))+
        ylab("Species richness") +
        xlab("Treatment group") +
        theme_classic (base_size = 12, base_family = "sans") + 
        scale_color_manual(values = c("#e31a1c", "#fb9a99", "#33a02c", "#b2df8a", "#1f78b4", "#a6cee3"), name = "Treatment", labels = c("Untreated","lyPMA", "Benzonase", "HostZERO", "MolYsis", "QIAamp")) + #color using 
        scale_fill_manual(values = c("#e31a1c", "#fb9a99", "#33a02c", "#b2df8a", "#1f78b4", "#a6cee3"), name = "Treatment", labels = c("Untreated","lyPMA", "Benzonase", "HostZERO", "MolYsis", "QIAamp")) + #color using 
        labs(tag = "A") +
        theme(plot.tag = element_text(size = 15),
              axis.text.x = element_blank(),
              axis.ticks.x = element_blank(),
              legend.position = "top") +
        facet_wrap(~sample_type, nrow = 1) + 
        guides(col = guide_legend(nrow = 1)) 

dat_text <- data.frame(
  label = c(
          #"", "***", "***", "**", "***", #label for Mock
          "", "", "", "*", "", #label for BAL
          "", "", "***", "*", "**", 
          "**", "***", "***", "***", "***"),
  sample_type = c(
          #"Mock", "Mock", "Mock", "Mock", "Mock", 
          "BAL", "BAL", "BAL", "BAL", "BAL", 
          "Nasal", "Nasal", "Nasal", "Nasal", "Nasal", 
          "Sputum", "Sputum", "Sputum", "Sputum", "Sputum"),
  treatment     = c(
          #"lyPMA", "Benzonase", "HostZERO", "MolYsis", "QIAamp", 
          "lyPMA", "Benzonase", "HostZERO", "MolYsis", "QIAamp",
          "lyPMA", "Benzonase", "HostZERO", "MolYsis", "QIAamp",
          "lyPMA", "Benzonase", "HostZERO", "MolYsis", "QIAamp"),
  S.obs = c(
          #50, 30, 35, 33, 31,
          30, 35, 50, 52, 50,
          30, 30, 30, 35, 30,
          100, 120, 147, 140, 125)
)



dat_text$sample_type <- factor(dat_text$sample_type, levels = c("BAL", "Nasal", "Sputum"
                                                                ))
dat_text$treatment <- factor(dat_text$treatment, levels = c("Untreated", "lyPMA", "Benzonase", "HostZERO", "MolYsis", "QIAamp"))


f3a <- f3a + geom_text(
  data    = dat_text,
  mapping = aes(x = treatment, y = S.obs, label = label)
)



f5S_mock <- ggplot(subset(sample_data(phyloseq$phyloseq_count) %>% 
                             data.frame,
                          sample_data(phyloseq$phyloseq_count)$sample_type %in%
                                  c("Mock")),
                   aes(x = treatment, y = S.obs)) +
        geom_jitter(aes(color = treatment), position = position_jitter(0.2),
                    size = 1.2, alpha = 0.3, stroke = 0) +
        stat_summary(aes(color = treatment),
                             fun.data="mean_sdl",  fun.args = list(mult=1), 
                             geom = "pointrange",  size = 0.4) +
        ylab("Species richness") +
        xlab("Treatment group") +
        theme_classic (base_size = 12, base_family = "sans") + 
        scale_color_manual(values = c("#e31a1c", "#fb9a99", "#33a02c", "#b2df8a", "#1f78b4", "#a6cee3"), name = "Treatment", labels = c("Untreated","lyPMA", "Benzonase", "HostZERO", "MolYsis", "QIAamp")) + #color using 
        labs(tag = "C") +
        theme(plot.tag = element_text(size = 15),
              axis.text.x = element_blank(),
              axis.ticks.x = element_blank(),
              legend.position = "top") +
        facet_wrap(~sample_type, nrow = 1) + 
        guides(col = guide_legend(nrow = 1)) 

dat_text <- data.frame(
  label = c(
          "", "***", "***", "**", "***"#, #label for Mock
          #"", "", "", "*", "", #label for BAL
          #"", "", "***", "*", "**", 
          #"**", "***", "***", "***", "***"
          ),
  sample_type = c(
          "Mock", "Mock", "Mock", "Mock", "Mock"#, 
          #"BAL", "BAL", "BAL", "BAL", "BAL", 
          #"Nasal", "Nasal", "Nasal", "Nasal", "Nasal", 
          #"Sputum", "Sputum", "Sputum", "Sputum", "Sputum"
          ),
  treatment = c(
          "lyPMA", "Benzonase", "HostZERO", "MolYsis", "QIAamp"#, 
          #"lyPMA", "Benzonase", "HostZERO", "MolYsis", "QIAamp",
          #"lyPMA", "Benzonase", "HostZERO", "MolYsis", "QIAamp",
          #"lyPMA", "Benzonase", "HostZERO", "MolYsis", "QIAamp"
          ),
  S.obs = c(
          50, 30, 35, 33, 31
          #30, 35, 50, 52, 50,
          #30, 30, 30, 35, 30,
          #100, 120, 147, 140, 125
          )
)



dat_text$sample_type <- factor(dat_text$sample_type, levels = c("Mock"#, 
                                                                #"BAL", "Nasal", "Sputum"
                                                                ))
dat_text$treatment <- factor(dat_text$treatment, levels = c("Untreated", "lyPMA", "Benzonase", "HostZERO", "MolYsis", "QIAamp"))


f5S_mock <- f5S_mock + geom_text(
  data    = dat_text,
  mapping = aes(x = treatment, y = S.obs, label = label)
)



#Making subset of non-zero samples without neg

phyloseq_rel_nz <- phyloseq$phyloseq_rel %>%
        subset_samples(S.obs != 0 & sample_type %in% c("Mock", 
                                                       "BAL", "Nasal", "Sputum"))

#distances of betadiversity - boxplots
horn_dist_long <- distance(phyloseq_rel_nz, method="horn") %>% as.matrix() %>% melt_dist() #making long data of distance matrices

#Adding sample type and treatment name. 
#this can be also done by merging metadata into the `horn_dist_long`
names <- data.frame(str_split_fixed(horn_dist_long$iso1, "_", 3))
names2 <- data.frame(str_split_fixed(horn_dist_long$iso2, "_", 3))
horn_dist_long$sample_id_1 <- paste(names$X1, names$X2, sep = "_")
horn_dist_long$method_1 <- ifelse(grepl("lyPMA", horn_dist_long$iso1),"lypma", 
                                         ifelse(grepl("ben", horn_dist_long$iso1),"benzonase", 
                                                ifelse(grepl("host", horn_dist_long$iso1),"host_zero", 
                                                       ifelse(grepl("qia", horn_dist_long$iso1),"qiaamp", 
                                                              ifelse(grepl("moly", horn_dist_long$iso1),"molysis", 
                                                                     "control")))))


#Adding data for iso 2 also should be done
horn_dist_long$sample_id_2 <- paste(names2$X1, names2$X2, sep = "_")
horn_dist_long$method_2 <-ifelse(grepl("lyPMA", horn_dist_long$iso2),"lypma", 
                                        ifelse(grepl("ben", horn_dist_long$iso2),"benzonase", 
                                               ifelse(grepl("host", horn_dist_long$iso2),"host_zero", 
                                                      ifelse(grepl("qia", horn_dist_long$iso2),"qiaamp", 
                                                             ifelse(grepl("moly", horn_dist_long$iso2),"molysis", 
                                                                    "control")))))


#subsetting distances of my interest
horn_dist_long$sample_id_1 <- ifelse(grepl("pos", horn_dist_long$sample_id_1, ignore.case = T),"Mock", 
                                 ifelse(grepl("neg|n_", horn_dist_long$sample_id_1, ignore.case = T),"Neg.",
                                        horn_dist_long$sample_id_1))
horn_dist_long$sample_id_2 <- ifelse(grepl("pos", horn_dist_long$sample_id_2, ignore.case = T),"Mock", 
                                 ifelse(grepl("neg|n_", horn_dist_long$sample_id_2, ignore.case = T),"Neg.",
                                        horn_dist_long$sample_id_2))


horn_dist_long_within_sampleid_from_control <- subset(horn_dist_long, horn_dist_long$sample_id_1 == horn_dist_long$sample_id_2) # data within samples

horn_dist_long_within_sampleid_from_control <- subset(horn_dist_long_within_sampleid_from_control,
                                                           horn_dist_long_within_sampleid_from_control$method_1 != horn_dist_long_within_sampleid_from_control$method_2) # remove irrelevant association

horn_dist_long_within_sampleid_from_control <- subset(horn_dist_long_within_sampleid_from_control, (horn_dist_long_within_sampleid_from_control$method_1 == "control") + (horn_dist_long_within_sampleid_from_control$method_2 == "control") != 0)


horn_dist_long_within_sampleid_from_control$treatment <- horn_dist_long_within_sampleid_from_control$method_1

horn_dist_long_within_sampleid_from_control$treatment <- ifelse(horn_dist_long_within_sampleid_from_control$treatment == "control", horn_dist_long_within_sampleid_from_control$method_2, horn_dist_long_within_sampleid_from_control$treatment) 


#Setting key method
horn_dist_long_within_sampleid_from_control$sample_type <- ifelse(grepl("NS", horn_dist_long_within_sampleid_from_control$iso1), "Nasal",
                                                                  ifelse(grepl("CFB", horn_dist_long_within_sampleid_from_control$iso1), "Sputum",
                                                                         ifelse(grepl("BAL", horn_dist_long_within_sampleid_from_control$iso1), "BAL",
                                                                                ifelse(grepl("pos|POS", horn_dist_long_within_sampleid_from_control$iso1, ignore.case = T), "Mock",
                                                                                       ifelse(grepl("neg|N_EXT", horn_dist_long_within_sampleid_from_control$iso1), "Neg.",NA)))))

#Making a column for baseline (controls, from where?)
horn_dist_long_within_sampleid_from_control <- horn_dist_long_within_sampleid_from_control %>% 
        mutate(dist_from = case_when(method_1 == "control" ~ iso1,
                                     method_2 == "control" ~ iso2))

dummy <- data.frame(iso1 = horn_dist_long_within_sampleid_from_control$dist_from %>% unique,
           iso2 = horn_dist_long_within_sampleid_from_control$dist_from %>% unique,
           dist = 0,
           treatment = "Untreated",
           method_1 = "control",
           method_2 = "control"
           )
names <- data.frame(str_split_fixed(dummy$iso1, "_", 3))
names2 <- data.frame(str_split_fixed(dummy$iso2, "_", 3))
dummy$sample_id_1 <- paste(names$X1, names$X2, sep = "_")
#Adding data for iso 2 also should be done
dummy$sample_id_2 <- paste(names2$X1, names2$X2, sep = "_")


#subsetting distances of my interest
dummy$sample_id_1 <- ifelse(grepl("pos", dummy$sample_id_1, ignore.case = T),"Mock", 
                                 ifelse(grepl("neg|n_", dummy$sample_id_1, ignore.case = T),"Neg.",
                                        dummy$sample_id_1))
dummy$sample_id_2 <- ifelse(grepl("pos", dummy$sample_id_2, ignore.case = T),"Mock", 
                                 ifelse(grepl("neg|n_", dummy$sample_id_2, ignore.case = T),"Neg.",
                                        dummy$sample_id_2))
dummy$sample_type <- ifelse(grepl("NS", dummy$iso1), "Nasal",
                            ifelse(grepl("CFB", dummy$iso1), "Sputum",
                                   ifelse(grepl("BAL", dummy$iso1), "BAL",
                                          ifelse(grepl("pos|POS", dummy$iso1, ignore.case = T), "Mock",
                                                 ifelse(grepl("neg|N_EXT", dummy$iso1), "Neg.",NA)))))
dummy <- subset(dummy, !is.na(dummy$sample_type))
horn_dist_long_within_sampleid_from_control <- bind_rows(horn_dist_long_within_sampleid_from_control, dummy)

#Here, sample id is the same as subject id.
horn_dist_long_within_sampleid_from_control$subject_id <- horn_dist_long_within_sampleid_from_control$sample_id_1

horn_dist_long_within_sampleid_from_control$treatment <-
        factor(horn_dist_long_within_sampleid_from_control$treatment,
               levels = c("Untreated", "lypma", "benzonase", "host_zero", "molysis", "qiaamp"))

#Making figure of beta diversity distances

## This only includes samples (BAL, Nasal and Sputum)
f3b2 <- rbind(cbind("Sputum",
            lmer(dist ~ treatment + (1|subject_id),
                data = horn_dist_long_within_sampleid_from_control %>%
                        subset(sample_type == "Sputum")) %>% 
                    confint()
            ),
      cbind("BAL",
            lmer(dist ~ treatment + (1|subject_id),
                 data = horn_dist_long_within_sampleid_from_control %>%
                         subset(sample_type == "BAL")) %>% 
                    confint()
            ),
      cbind("Nasal",
            lmer(dist ~ treatment + (1|subject_id),
                 data = horn_dist_long_within_sampleid_from_control %>%
                         subset(sample_type == "Nasal")) %>% 
                    confint()
     )
) %>% {
        row_names <- rownames(.)
        data_frame <- data.frame(.)
        data_frame$treatment <- row_names
        data_frame %>% 
                remove_rownames() %>%
                rename(sample_type = "V1",
               "2.5%" = "X2.5..",
               "97.5%" = "X97.5..") %>% 
                mutate(`2.5%` = as.numeric(`2.5%`),
                       `97.5%` = as.numeric(`97.5%`),
                       mean = (`97.5%`+`2.5%`)/2,
                       treatment = case_when(treatment == "(Intercept)" ~ "Untreated",
                                             treatment == "treatmentlypma" ~ "lyPMA",
                                             treatment == "treatmentbenzonase" ~ "Benzonase",
                                             treatment == "treatmenthost_zero" ~ "HostZERO",
                                             treatment == "treatmentmolysis" ~ "MolYsis",
                                             treatment == "treatmentqiaamp" ~ "QIAamp"
                                             ),
                       treatment = factor(treatment,
                                          levels = c("Untreated", "lyPMA", "Benzonase",
                                      "HostZERO", "MolYsis", "QIAamp")),
                       sample_type = factor(sample_type,
                                          levels = c("BAL", "Nasal", "Sputum"))
                       ) %>%
                subset(treatment %in% c(#"Untreated", 
                                        "lyPMA", "Benzonase",
                                      "HostZERO", "MolYsis", "QIAamp"))
} %>%
        ggplot(aes(x = mean, y = treatment, col = treatment)) +
        geom_point(aes(x=mean), shape=15, size=3) +
        geom_linerange(aes(xmin=`2.5%`, xmax=`97.5%`)) +
        facet_wrap(~sample_type, nrow = 4) +
        scale_y_discrete(limits=rev) +
        scale_color_manual(values = c(#"#e31a1c",
                                      "#fb9a99","#33a02c",
                                      "#b2df8a","#1f78b4","#a6cee3"),
                           name = "Treatment",
                           breaks = c(#"Untreated", 
                                      "lyPMA", "Benzonase",
                                      "HostZERO", "MolYsis", "QIAamp")) + #color using https://colorbrewer2.org/#type=qualitative&scheme=Set1&n=6
        xlab("Morisita-Horn dissimilarity from untreated") +
        ylab("Treatment group") +
        theme_classic (base_size = 12, base_family = "sans") + 
        theme(plot.tag = element_text(size = 15),
              axis.text.y = element_blank(),
              axis.ticks.y = element_blank(),
              legend.position = "none") +
        labs(tag = "B") +
        geom_vline(xintercept = 0, col = "black", linetype="dotted") +
        #coord_cartesian(xlim=c(-0.5, 1)) +
        #geom_text(aes(x = 0, label = treatment), hjust = 0, nudge_x = -.55, size = 3, color = "black", family = "sans") +
        #geom_text(aes(x = 0, label = text), hjust = 0, nudge_x = -0.4, size = 3, color = "black", family = "sans") +
        scale_x_continuous(breaks = c(-0.25, 0, 0.25, 0.5, 0.75),
                           labels = c(-0.25, "0 (low bias)", 0.25, 0.5, "0.75 (high bias)"))


dat_text <- data.frame(
  label = c(
          #"", "***", "***", "**", "***", #label for Mock
          "*", "", "", "", "", #label for BAL
          "*", "*", "", "**", "", 
          "**", "***", "***", "***", "***"),
  sample_type = c(
          #"Mock", "Mock", "Mock", "Mock", "Mock", 
          "BAL", "BAL", "BAL", "BAL", "BAL", 
          "Nasal", "Nasal", "Nasal", "Nasal", "Nasal", 
          "Sputum", "Sputum", "Sputum", "Sputum", "Sputum"),
  treatment     = c(
          #"lyPMA", "Benzonase", "HostZERO", "MolYsis", "QIAamp", 
          "lyPMA", "Benzonase", "HostZERO", "MolYsis", "QIAamp",
          "lyPMA", "Benzonase", "HostZERO", "MolYsis", "QIAamp",
          "lyPMA", "Benzonase", "HostZERO", "MolYsis", "QIAamp"),
  mean = c(
          #50, 30, 35, 33, 31,
          0.6, 0, 0, 0, 0,
          0.32, 0.3, 0, 0.37, 0,
          0.58, 0.75, 0.85, 0.83, 0.85)
)



dat_text$sample_type <- factor(dat_text$sample_type, levels = c("BAL", "Nasal", "Sputum"
                                                                ))
dat_text$treatment <- factor(dat_text$treatment, levels = c("Untreated", "lyPMA", "Benzonase", "HostZERO", "MolYsis", "QIAamp"))


f3b2 <- f3b2 + geom_text(
  data    = dat_text,
  mapping = aes(y = treatment, x = mean, label = label),
  col = "black"
)


f3b2_box <- horn_dist_long_within_sampleid_from_control %>% 
        mutate(across(sample_type, factor, levels=c(#"Mock", 
                                                    "BAL", "Nasal","Sputum"
                                                    ))) %>%
        subset(., .$sample_type != "Neg.") %>% 
        subset(., .$treatment != "Untreated") %>%
  mutate(treatment = factor(treatment, levels = c("lypma", "benzonase", "host_zero", "molysis", "qiaamp"),
                            labels = c("lyPMA", "Benzonase", "HostZERO", "MolYsis", "QIAamp"))) %>%
        ggplot(aes(x = dist, y = treatment, col = treatment)) +
        geom_boxplot(aes(x=dist)) +
        #geom_linerange(aes(xmin=`2.5%`, xmax=`97.5%`)) +
        facet_wrap(~sample_type, nrow = 4) +
        scale_y_discrete(limits=rev) +
        scale_color_manual(values = c(#"#e31a1c",
                                      "#fb9a99","#33a02c",
                                      "#b2df8a","#1f78b4","#a6cee3"),
                           name = "Treatment",
                           breaks = c(#"Untreated", 
                                      "lyPMA", "Benzonase",
                                      "HostZERO", "MolYsis", "QIAamp")) + #color using https://colorbrewer2.org/#type=qualitative&scheme=Set1&n=6
        xlab("Morisita-Horn dissimilarity from untreated") +
        ylab("Treatment group") +
        theme_classic (base_size = 12, base_family = "sans") + 
        theme(plot.tag = element_text(size = 15),
              axis.text.y = element_blank(),
              axis.ticks.y = element_blank(),
              legend.position = "none") +
        labs(tag = "B") +
        geom_vline(xintercept = 0, col = "black", linetype="dotted") +
        #coord_cartesian(xlim=c(-0.5, 1)) +
        #geom_text(aes(x = 0, label = treatment), hjust = 0, nudge_x = -.55, size = 3, color = "black", family = "sans") +
        #geom_text(aes(x = 0, label = text), hjust = 0, nudge_x = -0.4, size = 3, color = "black", family = "sans") +
        scale_x_continuous(breaks = c(-0.25, 0, 0.25, 0.5, 0.75),
                           labels = c(-0.25, "0 (low bias)", 0.25, 0.5, "0.75 (high bias)"))


dat_text <- data.frame(
  label = c(
          #"", "***", "***", "**", "***", #label for Mock
          "*", "", "", "", "", #label for BAL
          "*", "*", "", "**", "", 
          "**", "***", "***", "***", "***"),
  sample_type = c(
          #"Mock", "Mock", "Mock", "Mock", "Mock", 
          "BAL", "BAL", "BAL", "BAL", "BAL", 
          "Nasal", "Nasal", "Nasal", "Nasal", "Nasal", 
          "Sputum", "Sputum", "Sputum", "Sputum", "Sputum"),
  treatment     = c(
          #"lyPMA", "Benzonase", "HostZERO", "MolYsis", "QIAamp", 
          "lyPMA", "Benzonase", "HostZERO", "MolYsis", "QIAamp",
          "lyPMA", "Benzonase", "HostZERO", "MolYsis", "QIAamp",
          "lyPMA", "Benzonase", "HostZERO", "MolYsis", "QIAamp"),
  dist = c(
          #50, 30, 35, 33, 31,
          0.9, 0, 0, 0, 0,
          0.48, 0.4, 0, 0.55, 0,
          0.8, 0.82,1, 1.02, 0.98)
)



dat_text$sample_type <- factor(dat_text$sample_type, levels = c("BAL", "Nasal", "Sputum"
                                                                ))
dat_text$treatment <- factor(dat_text$treatment, levels = c("Untreated", "lyPMA", "Benzonase", "HostZERO", "MolYsis", "QIAamp"))


f3b2_box <- f3b2_box + geom_text(
  data    = dat_text,
  mapping = aes(y = treatment, x = dist, label = label),
  col = "black"
)



# Mock community's beta-diversity plot.
## This only includes positive control samples (Mock)

f5S_mock_mh <- horn_dist_long_within_sampleid_from_control %>% 
        mutate(across(sample_type, factor, levels=c("Mock"#, 
                                                    #"BAL", "Nasal","Sputum"
                                                    ))) %>%
        subset(., .$sample_type != "Neg.") %>% 
        group_by(sample_type, treatment) %>%
        summarise(mean = mean(dist, na.rm = TRUE),
            sd = sd(dist, na.rm = TRUE),
            n = n()) %>%
        subset(., .$treatment != "Untreated") %>%
  mutate(se = sd / sqrt(n),
         lower.ci = mean - qt(1 - (0.05 / 2), n - 1) * se,
         upper.ci = mean + qt(1 - (0.05 / 2), n - 1) * se,
         treatment = factor(treatment, levels = c(#"Untreated",
                                                  "lypma", "benzonase", "host_zero", "molysis", "qiaamp"),
                            labels = c(#"Untreated",
                                       "lyPMA", "Benzonase", "HostZERO", "MolYsis", "QIAamp"))) %>%
        #,
         #text = paste(sprintf("%.2f", round(mean, digits = 2)), " [", sprintf("%.2f", round(lower.ci, digits = 2)), ", ", sprintf("%.2f", round(upper.ci, digits = 2)), "]", sep = "")) %>%
        ggplot(aes(x = mean, y = treatment, col = treatment)) +
        geom_point(aes(x=mean), shape=15, size=3) +
        geom_linerange(aes(xmin=lower.ci, xmax=upper.ci)) +
        facet_wrap(~sample_type, nrow = 4) +
        scale_y_discrete(limits=rev) +
        scale_color_manual(values = c(#"#e31a1c", 
                                      "#fb9a99", "#33a02c", "#b2df8a", "#1f78b4", "#a6cee3"), name = "Treatment", 
                           labels = c(#"Untreated",
                                      "lyPMA", "Benzonase", "HostZERO", "MolYsis", "QIAamp")) + #color using https://colorbrewer2.org/#type=qualitative&scheme=Set1&n=6
        xlab("Morisita-Horn dissimilarity from untreated") +
        ylab("Treatment group") +
        theme_classic (base_size = 12, base_family = "sans") + 
        theme(plot.tag = element_text(size = 15),
              axis.text.y = element_blank(),
              axis.ticks.y = element_blank(),
              legend.position = "none") +
        labs(tag = "D") +
        geom_vline(xintercept = 0, col = "black", linetype="dotted") +
        #coord_cartesian(xlim=c(-0.5, 1)) +
        #geom_text(aes(x = 0, label = treatment), hjust = 0, nudge_x = -.55, size = 3, color = "black", family = "sans") +
        #geom_text(aes(x = 0, label = text), hjust = 0, nudge_x = -0.4, size = 3, color = "black", family = "sans") +
        scale_x_continuous(breaks = c(-0.5, 0, 0.5, 1, 1.5), labels = c(-0.5, "0 (low bias)", 0.5, 1, "1.5 (high bias)"))



f5S_mock_mh_box <- horn_dist_long_within_sampleid_from_control %>% 
        mutate(across(sample_type, factor, levels=c("Mock"#, 
                                                    #"BAL", "Nasal","Sputum"
                                                    ))) %>%
        subset(., .$sample_type != "Neg.") %>% 
        subset(., .$treatment != "Untreated") %>%
  mutate(treatment = factor(treatment, levels = c(#"Untreated",
                                                  "lypma", "benzonase", "host_zero", "molysis", "qiaamp"),
                            labels = c(#"Untreated",
                                       "lyPMA", "Benzonase", "HostZERO", "MolYsis", "QIAamp"))) %>%
        #,
         #text = paste(sprintf("%.2f", round(mean, digits = 2)), " [", sprintf("%.2f", round(lower.ci, digits = 2)), ", ", sprintf("%.2f", round(upper.ci, digits = 2)), "]", sep = "")) %>%
        ggplot(aes(x = dist, y = treatment, col = treatment)) +
        geom_boxplot() +
        facet_wrap(~sample_type, nrow = 4) +
        scale_y_discrete(limits=rev) +
        scale_color_manual(values = c(#"#e31a1c", 
                                      "#fb9a99", "#33a02c", "#b2df8a", "#1f78b4", "#a6cee3"), name = "Treatment", 
                           labels = c(#"Untreated",
                                      "lyPMA", "Benzonase", "HostZERO", "MolYsis", "QIAamp")) + #color using https://colorbrewer2.org/#type=qualitative&scheme=Set1&n=6
        xlab("Morisita-Horn dissimilarity from untreated") +
        ylab("Treatment group") +
        theme_classic (base_size = 12, base_family = "sans") + 
        theme(plot.tag = element_text(size = 15),
              axis.text.y = element_blank(),
              axis.ticks.y = element_blank(),
              legend.position = "none") +
        labs(tag = "D") +
        geom_vline(xintercept = 0, col = "black", linetype="dotted") +
        #coord_cartesian(xlim=c(-0.5, 1)) +
        #geom_text(aes(x = 0, label = treatment), hjust = 0, nudge_x = -.55, size = 3, color = "black", family = "sans") +
        #geom_text(aes(x = 0, label = text), hjust = 0, nudge_x = -0.4, size = 3, color = "black", family = "sans") +
        scale_x_continuous(breaks = c(-0.5, 0, 0.5, 1, 1.5), labels = c(-0.5, "0 (low bias)", 0.5, 1, "1.5 (high bias)"))


fig3 <- ggarrange(f3a, f3b2_box, ncol = 1, common.legend = T, align = "hv")

fig3

# png(file = "Project_SICAS2_microbiome/7_Manuscripts/2022_MGK_Host_Depletion/Figures/Figure3.png",   # The directory you want to save the file in
#     width = 180, # The width of the plot in inches
#     height = 220, # The height of the plot in inches
#     units = "mm",
#     res = 600
# ) #fixing multiple page issue
# 
# fig3
# # alpha diversity plots
# #ggarrange(f4ad, ggarrange(f4e, f4f, ncol = 2),
# #          ncol = 1) # alpha diversity plots
# 
# dev.off()



pdf(file = "Project_SICAS2_microbiome/7_Manuscripts/2022_MGK_Host_Depletion/Figures/Figure3.pdf",   # The directory you want to save the file in
    width = 183 / 25.4, # Convert width from mm to inches (180 mm)
    height = 220 / 25.4, # Convert height from mm to inches (160 mm)
    paper = "special",   # Prevents default paper size settings
    onefile = FALSE      # Ensures a single page per PDF file
)


fig3
# alpha diversity plots
#ggarrange(f4ad, ggarrange(f4e, f4f, ncol = 2),
#          ncol = 1) # alpha diversity plots

dev.off()
## quartz_off_screen 
##                 2

vii. & viii. Do these host depletion methods introduce bias in the sequenced community?

  • Does bias differ by sample type?

Alpha diversity changes

Calculation for log centered Final reads

Distribution of centralized final reads

sample_data <- sample_data(phyloseq$phyloseq_count)
sample_data$log_centered_final_reads <- log(sample_data$Final_reads + 1) - median(log((subset(sample_data, sample_data$sample_type %in% c("BAL") & sample_data$treatment %in% c("Untreated")) %>% .$Final_reads) + 1))

sample_data$bal_log_centered_final_reads <- log(sample_data$Final_reads + 1) - median(log((subset(sample_data, sample_data$sample_type %in% c("BAL") & sample_data$treatment %in% c("Untreated"))%>% .$Final_reads) + 1))

sample_data$ns_log_centered_final_reads <- log(sample_data$Final_reads + 1) - median(log((subset(sample_data, sample_data$sample_type %in% c("Nasal") & sample_data$treatment %in% c("Untreated"))%>% .$Final_reads) + 1))

sample_data$spt_log_centered_final_reads <- log(sample_data$Final_reads + 1) - median(log((subset(sample_data, sample_data$sample_type %in% c("Sputum") & sample_data$treatment %in% c("Untreated"))%>% .$Final_reads) + 1))


subset(sample_data, sample_data$sample_type %in% c("BAL", "Nasal", "Sputum")) %>% .$log_centered_final_reads %>% hist (main = "Histogram of centered log10 final reads of BAL, Nasal, Sputum")

subset(sample_data, sample_data$sample_type %in% c("BAL")) %>% .$log_centered_final_reads %>% hist (main = "Histogram of centered log10 final reads of BAL")

subset(sample_data, sample_data$sample_type %in% c("Nasal")) %>% .$log_centered_final_reads %>% hist (main = "Histogram of centered log10 final reads of Nasal")

subset(sample_data, sample_data$sample_type %in% c("Sputum")) %>% .$log_centered_final_reads %>% hist (main = "Histogram of centered log10 final reads of Sputum")

Species richness (all sample)

This table is too redundant. Removed from the manuscript..

Effect size, standard error (SE) and p-value of a statistical test for species richness with an interaction term using linear mixed effect model (Species richness ~ sample_type * treatment + log10 (Final_reads) + (1|subject_id) ).

Interaction term was highly significant (p = 2.2e-16)

Raw result

lmer_sr <- lmer(S.obs ~ treatment * sample_type + log10(Final_reads) + (1|subject_id),
                    data = sample_data %>% 
                            data.frame %>% 
                            subset(., .$S.obs != 0))

lmer_sr %>% summary
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: 
## S.obs ~ treatment * sample_type + log10(Final_reads) + (1 | subject_id)
##    Data: sample_data %>% data.frame %>% subset(., .$S.obs != 0)
## 
## REML criterion at convergence: 933.1
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -4.0347 -0.2709  0.0045  0.2423  3.0128 
## 
## Random effects:
##  Groups     Name        Variance Std.Dev.
##  subject_id (Intercept) 96.87    9.842   
##  Residual               53.26    7.298   
## Number of obs: 155, groups:  subject_id, 22
## 
## Fixed effects:
##                                      Estimate Std. Error       df t value
## (Intercept)                          -72.0854    14.4625  53.4034  -4.984
## treatmentlyPMA                        -2.2692     4.4631 108.5096  -0.508
## treatmentBenzonase                    -2.7922     4.4207 108.3433  -0.632
## treatmentHostZERO                     -4.1979     4.4402 108.4205  -0.945
## treatmentMolYsis                      -2.7594     4.4258 108.3638  -0.623
## treatmentQIAamp                       -3.1897     4.4276 108.3708  -0.720
## sample_typeMock                        8.0057    14.9720  18.7520   0.535
## sample_typeBAL                         5.5843    11.8180  20.2523   0.473
## sample_typeNasal                      -3.2587    11.0408  18.4140  -0.295
## sample_typeSputum                     11.2252    11.6526  19.2014   0.963
## log10(Final_reads)                    13.1570     1.7246 116.2668   7.629
## treatmentlyPMA:sample_typeMock         7.0987     6.5463 109.1116   1.084
## treatmentBenzonase:sample_typeMock   -13.4773     6.2512 108.3417  -2.156
## treatmentHostZERO:sample_typeMock    -11.8560     6.2612 108.3698  -1.894
## treatmentMolYsis:sample_typeMock      -9.2558     6.2557 108.3544  -1.480
## treatmentQIAamp:sample_typeMock      -13.6156     6.2504 108.3393  -2.178
## treatmentlyPMA:sample_typeBAL         -1.2644     6.7944 108.3380  -0.186
## treatmentBenzonase:sample_typeBAL     -3.0508     6.7836 109.2406  -0.450
## treatmentHostZERO:sample_typeBAL      -0.2777     6.7685 109.1960  -0.041
## treatmentMolYsis:sample_typeBAL        7.1574     6.8419 109.4064   1.046
## treatmentQIAamp:sample_typeBAL        -1.9740     6.8397 109.4004  -0.289
## treatmentlyPMA:sample_typeNasal        4.6417     6.3470 112.1444   0.731
## treatmentBenzonase:sample_typeNasal    0.4571     6.1682 112.2096   0.074
## treatmentHostZERO:sample_typeNasal     2.8911     6.2602 112.5293   0.462
## treatmentMolYsis:sample_typeNasal      6.1367     6.1502 111.7217   0.998
## treatmentQIAamp:sample_typeNasal      -2.7832     6.3453 112.1047  -0.439
## treatmentlyPMA:sample_typeSputum      32.7634     6.3974 108.3573   5.121
## treatmentBenzonase:sample_typeSputum  58.2631     6.5294 108.7077   8.923
## treatmentHostZERO:sample_typeSputum   85.1820     6.8449 109.4565  12.445
## treatmentMolYsis:sample_typeSputum    89.3757     7.1413 110.0631  12.515
## treatmentQIAamp:sample_typeSputum     69.7623     6.7477 109.2382  10.339
##                                      Pr(>|t|)    
## (Intercept)                          6.92e-06 ***
## treatmentlyPMA                         0.6122    
## treatmentBenzonase                     0.5290    
## treatmentHostZERO                      0.3465    
## treatmentMolYsis                       0.5343    
## treatmentQIAamp                        0.4728    
## sample_typeMock                        0.5991    
## sample_typeBAL                         0.6416    
## sample_typeNasal                       0.7712    
## sample_typeSputum                      0.3474    
## log10(Final_reads)                   7.12e-12 ***
## treatmentlyPMA:sample_typeMock         0.2806    
## treatmentBenzonase:sample_typeMock     0.0333 *  
## treatmentHostZERO:sample_typeMock      0.0609 .  
## treatmentMolYsis:sample_typeMock       0.1419    
## treatmentQIAamp:sample_typeMock        0.0315 *  
## treatmentlyPMA:sample_typeBAL          0.8527    
## treatmentBenzonase:sample_typeBAL      0.6538    
## treatmentHostZERO:sample_typeBAL       0.9674    
## treatmentMolYsis:sample_typeBAL        0.2978    
## treatmentQIAamp:sample_typeBAL         0.7734    
## treatmentlyPMA:sample_typeNasal        0.4661    
## treatmentBenzonase:sample_typeNasal    0.9411    
## treatmentHostZERO:sample_typeNasal     0.6451    
## treatmentMolYsis:sample_typeNasal      0.3205    
## treatmentQIAamp:sample_typeNasal       0.6618    
## treatmentlyPMA:sample_typeSputum     1.33e-06 ***
## treatmentBenzonase:sample_typeSputum 1.24e-14 ***
## treatmentHostZERO:sample_typeSputum   < 2e-16 ***
## treatmentMolYsis:sample_typeSputum    < 2e-16 ***
## treatmentQIAamp:sample_typeSputum     < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

ANOVA result on LMER result for species richness

lmer_sr %>% anova()

Table S3. Species richness (stratified) without depth

Table S3. Effect size, standard error (SE) and p-value of a statistical test for species richness using linear mixed effect model stratified by sample type (Species richness ~ treatment + (1|subject_id) ). Stratified analyses were conducted for each sample type as an interaction term of sample type and treatment was significant at an ANOVA test (p-value < 0.001) using a model, lmer(species richness ~ sample type + treatment + sample type * treatment + (1|subject_id)). Statistical significances were noted with *: p-value < 0.05, **: p-value < 0.01, and ***: p-value < 0.001.

LMER raw result - Mock

sr_lmer_mock <- lm(S.obs ~ treatment,
                    data = sample_data %>% 
                            data.frame %>% 
                            subset(., .$sample_type %in% c("Mock") & S.obs != 0))

sr_lmer_mock %>% summary() 
## 
## Call:
## lm(formula = S.obs ~ treatment, data = sample_data %>% data.frame %>% 
##     subset(., .$sample_type %in% c("Mock") & S.obs != 0))
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -17.400  -1.100  -0.200   1.067  13.600 
## 
## Coefficients:
##                    Estimate Std. Error t value Pr(>|t|)    
## (Intercept)          40.667      2.560  15.883 1.43e-14 ***
## treatmentlyPMA       -5.267      3.798  -1.387 0.177739    
## treatmentBenzonase  -16.467      3.798  -4.336 0.000208 ***
## treatmentHostZERO   -15.667      3.798  -4.125 0.000359 ***
## treatmentMolYsis    -12.267      3.798  -3.230 0.003452 ** 
## treatmentQIAamp     -15.467      3.798  -4.073 0.000411 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 6.272 on 25 degrees of freedom
## Multiple R-squared:  0.5595, Adjusted R-squared:  0.4714 
## F-statistic:  6.35 on 5 and 25 DF,  p-value: 0.0006198

LMER raw result - BAL

sr_lmer_bal <- lmer(S.obs ~ treatment + (1|subject_id),
                    data = sample_data %>% 
                            data.frame %>% 
                            subset(., .$sample_type %in% c("BAL") & S.obs != 0))
sr_lmer_bal %>% summary()
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: S.obs ~ treatment + (1 | subject_id)
##    Data: 
## sample_data %>% data.frame %>% subset(., .$sample_type %in% c("BAL") &  
##     S.obs != 0)
## 
## REML criterion at convergence: 179
## 
## Scaled residuals: 
##      Min       1Q   Median       3Q      Max 
## -1.45497 -0.50930 -0.00625  0.43924  2.32013 
## 
## Random effects:
##  Groups     Name        Variance Std.Dev.
##  subject_id (Intercept) 90.44    9.510   
##  Residual               94.26    9.709   
## Number of obs: 28, groups:  subject_id, 5
## 
## Fixed effects:
##                    Estimate Std. Error     df t value Pr(>|t|)  
## (Intercept)           4.919      6.526 12.892   0.754   0.4646  
## treatmentlyPMA        1.750      6.865 18.131   0.255   0.8017  
## treatmentBenzonase    5.681      6.584 18.317   0.863   0.3994  
## treatmentHostZERO     8.881      6.584 18.317   1.349   0.1938  
## treatmentMolYsis     18.881      6.584 18.317   2.868   0.0101 *
## treatmentQIAamp       9.481      6.584 18.317   1.440   0.1667  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr) trtPMA trtmnB tHZERO trtmMY
## trtmntlyPMA -0.526                            
## trtmntBnzns -0.570  0.521                     
## trtmntHZERO -0.570  0.521  0.565              
## trtmntMlYss -0.570  0.521  0.565  0.565       
## trtmntQIAmp -0.570  0.521  0.565  0.565  0.565

LMER raw result - Nasal

sr_lmer_ns <- lmer(S.obs ~ treatment + (1|subject_id),
                   data = sample_data %>% 
                           data.frame %>% 
                           subset(., .$sample_type %in% c("Nasal")))
sr_lmer_ns %>% summary()
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: S.obs ~ treatment + (1 | subject_id)
##    Data: 
## sample_data %>% data.frame %>% subset(., .$sample_type %in% c("Nasal"))
## 
## REML criterion at convergence: 179.7
## 
## Scaled residuals: 
##      Min       1Q   Median       3Q      Max 
## -1.93714 -0.54065 -0.09294  0.62090  3.07984 
## 
## Random effects:
##  Groups     Name        Variance Std.Dev.
##  subject_id (Intercept)  1.308   1.144   
##  Residual               18.894   4.347   
## Number of obs: 35, groups:  subject_id, 10
## 
## Fixed effects:
##                    Estimate Std. Error      df t value Pr(>|t|)    
## (Intercept)         10.4000     1.4213 28.7781   7.317 4.87e-08 ***
## treatmentlyPMA      -4.8090     2.4049 25.1935  -2.000 0.056431 .  
## treatmentBenzonase  -0.3632     2.4053 25.2997  -0.151 0.881180    
## treatmentHostZERO   10.0368     2.4053 25.2997   4.173 0.000311 ***
## treatmentMolYsis     6.2090     2.4049 25.1935   2.582 0.016026 *  
## treatmentQIAamp      7.7632     2.4053 25.2997   3.227 0.003440 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr) trtPMA trtmnB tHZERO trtmMY
## trtmntlyPMA -0.553                            
## trtmntBnzns -0.553  0.314                     
## trtmntHZERO -0.553  0.314  0.347              
## trtmntMlYss -0.553  0.307  0.339  0.339       
## trtmntQIAmp -0.553  0.339  0.306  0.306  0.314

LMER raw result - Sputum

sr_lmer_spt <- lmer(S.obs ~ treatment + (1|subject_id),
                    data = sample_data %>% 
                            data.frame %>% 
                            subset(., .$sample_type %in% c("Sputum")))
sr_lmer_spt %>% summary()
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: S.obs ~ treatment + (1 | subject_id)
##    Data: 
## sample_data %>% data.frame %>% subset(., .$sample_type %in% c("Sputum"))
## 
## REML criterion at convergence: 218.6
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -2.0627 -0.4486 -0.1189  0.3630  1.7435 
## 
## Random effects:
##  Groups     Name        Variance Std.Dev.
##  subject_id (Intercept) 322.9    17.97   
##  Residual               245.7    15.68   
## Number of obs: 30, groups:  subject_id, 5
## 
## Fixed effects:
##                    Estimate Std. Error      df t value Pr(>|t|)    
## (Intercept)          15.800     10.664   9.187   1.482  0.17192    
## treatmentlyPMA       37.600      9.914  20.000   3.793  0.00114 ** 
## treatmentBenzonase   66.600      9.914  20.000   6.718 1.55e-06 ***
## treatmentHostZERO   103.000      9.914  20.000  10.390 1.66e-09 ***
## treatmentMolYsis    112.800      9.914  20.000  11.378 3.46e-10 ***
## treatmentQIAamp      85.200      9.914  20.000   8.594 3.79e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr) trtPMA trtmnB tHZERO trtmMY
## trtmntlyPMA -0.465                            
## trtmntBnzns -0.465  0.500                     
## trtmntHZERO -0.465  0.500  0.500              
## trtmntMlYss -0.465  0.500  0.500  0.500       
## trtmntQIAmp -0.465  0.500  0.500  0.500  0.500
sr_lmer_spt %>% confint()
##                        2.5 %    97.5 %
## .sig01              8.318892  36.73323
## .sigma             10.877929  19.02664
## (Intercept)        -4.977360  36.57737
## treatmentlyPMA     19.531174  55.66883
## treatmentBenzonase 48.531174  84.66883
## treatmentHostZERO  84.931174 121.06883
## treatmentMolYsis   94.731174 130.86883
## treatmentQIAamp    67.131174 103.26883

Tidy summarized table

sr_lmer_mock_kbl <-  sr_lmer_mock %>%
        summary() %>%
        .$coefficients %>%
        data.frame(check.names = F) %>% 
        mutate(` ` = case_when(abs(`Pr(>|t|)`) < 0.001 ~ "***",
                               abs(`Pr(>|t|)`) < 0.01 ~ "**",
                               abs(`Pr(>|t|)`) < 0.05 ~ "*",
                               .default = " ")) %>% 
        rename("<i>p</i>-value" = "Pr(>|t|)",
               t_value = "t value") %>%
        merge(., 
              sr_lmer_mock %>% confint() %>%
                        round(digits = 1) %>% 
                        format(nsmall = 1) %>%
                        as.data.frame() %>%
                        mutate(conf = paste("(",
                                            gsub(" ","", .[,1]),
                                            ", ",
                                            gsub(" ","", .[,2]),
                                            ")",
                                            sep = "")),
              by = 0
              ) %>%
        column_to_rownames("Row.names") %>%
        rownames_to_column(var = "x") %>% mutate(x = gsub("treatment|sample_type", "", x)) %>% mutate(x = gsub(":", " * ", x)) %>%
        mutate(x = gsub("bal_log_centered_final_reads", "log<sub>10</sub>(Final reads)", x)) %>%
        column_to_rownames(var = "x") %>% 
        mutate("Effect size (95% CI)" = 
                       paste(round(Estimate, digits = 1) %>%
                                     format(nsmall = 1),
                             conf, 
                             sep = " "),
               "<i>p</i>-value" = round(`<i>p</i>-value`, 3)) %>%
        dplyr::select(c("Effect size (95% CI)", "<i>p</i>-value", " ")) %>%
        .[c("(Intercept)",
                  "lyPMA",
                  "Benzonase",
                  "HostZERO",
                  "MolYsis",
                  "QIAamp"),]
        

sr_lmer_bal_kbl <-  sr_lmer_bal %>%
        summary() %>%
        .$coefficients %>%
        data.frame(check.names = F) %>% 
        mutate(` ` = case_when(abs(`Pr(>|t|)`) < 0.001 ~ "***",
                               abs(`Pr(>|t|)`) < 0.01 ~ "**",
                               abs(`Pr(>|t|)`) < 0.05 ~ "*",
                               .default = " ")) %>% 
        rename("<i>p</i>-value" = "Pr(>|t|)",
               t_value = "t value") %>%
        merge(., 
              sr_lmer_bal %>% confint() %>%
                        round(digits = 1) %>% 
                        format(nsmall = 1) %>%
                        as.data.frame() %>%
                        mutate(conf = paste("(",
                                            gsub(" ","", .[,1]),
                                            ", ",
                                            gsub(" ","", .[,2]),
                                            ")",
                                            sep = "")),
              by = 0
              ) %>%
        column_to_rownames("Row.names") %>%
        rownames_to_column(var = "x") %>% mutate(x = gsub("treatment|sample_type", "", x)) %>% mutate(x = gsub(":", " * ", x)) %>%
        mutate(x = gsub("bal_log_centered_final_reads", "log<sub>10</sub>(Final reads)", x)) %>%
        column_to_rownames(var = "x") %>% 
        mutate("Effect size (95% CI)" = 
                       paste(round(Estimate, digits = 1) %>%
                                     format(nsmall = 1),
                             conf, 
                             sep = " "),
               "<i>p</i>-value" = round(`<i>p</i>-value`, 3)) %>%
        dplyr::select(c("Effect size (95% CI)", "<i>p</i>-value", " ")) %>%
        .[c("(Intercept)",
                  "lyPMA",
                  "Benzonase",
                  "HostZERO",
                  "MolYsis",
                  "QIAamp"),]
        
sr_lmer_ns_kbl <-  sr_lmer_ns %>%
        summary() %>%
        .$coefficients %>%
        data.frame(check.names = F) %>% 
        mutate(` ` = case_when(abs(`Pr(>|t|)`) < 0.001 ~ "***",
                               abs(`Pr(>|t|)`) < 0.01 ~ "**",
                               abs(`Pr(>|t|)`) < 0.05 ~ "*",
                               .default = " ")) %>% 
        rename("<i>p</i>-value" = "Pr(>|t|)",
               t_value = "t value") %>%
        merge(., 
              sr_lmer_ns %>% confint() %>%
                        round(digits = 1) %>% 
                        format(nsmall = 1) %>%
                        as.data.frame() %>%
                        mutate(conf = paste("(",
                                            gsub(" ","", .[,1]),
                                            ", ",
                                            gsub(" ","", .[,2]),
                                            ")",
                                            sep = "")),
              by = 0
              ) %>%
        column_to_rownames("Row.names") %>%
        rownames_to_column(var = "x") %>% mutate(x = gsub("treatment|sample_type", "", x)) %>% mutate(x = gsub(":", " * ", x)) %>%
        mutate(x = gsub("bal_log_centered_final_reads", "log<sub>10</sub>(Final reads)", x)) %>%
        column_to_rownames(var = "x") %>% 
        mutate("Effect size (95% CI)" = 
                       paste(round(Estimate, digits = 1) %>%
                                     format(nsmall = 1),
                             conf, 
                             sep = " "),
               "<i>p</i>-value" = round(`<i>p</i>-value`, 3)) %>%
        dplyr::select(c("Effect size (95% CI)", "<i>p</i>-value", " ")) %>%
        .[c("(Intercept)",
                  "lyPMA",
                  "Benzonase",
                  "HostZERO",
                  "MolYsis",
                  "QIAamp"),]


sr_lmer_spt_kbl <-  sr_lmer_spt %>%
        summary() %>%
        .$coefficients %>%
        data.frame(check.names = F) %>% 
        mutate(` ` = case_when(abs(`Pr(>|t|)`) < 0.001 ~ "***",
                               abs(`Pr(>|t|)`) < 0.01 ~ "**",
                               abs(`Pr(>|t|)`) < 0.05 ~ "*",
                               .default = " ")) %>% 
        rename("<i>p</i>-value" = "Pr(>|t|)",
               t_value = "t value") %>%
        merge(., 
              sr_lmer_spt %>% confint() %>%
                        round(digits = 1) %>% 
                        format(nsmall = 1) %>%
                        as.data.frame() %>%
                        mutate(conf = paste("(",
                                            gsub(" ","", .[,1]),
                                            ", ",
                                            gsub(" ","", .[,2]),
                                            ")",
                                            sep = "")),
              by = 0
              ) %>%
        column_to_rownames("Row.names") %>%
        rownames_to_column(var = "x") %>% mutate(x = gsub("treatment|sample_type", "", x)) %>% mutate(x = gsub(":", " * ", x)) %>%
        mutate(x = gsub("bal_log_centered_final_reads", "log<sub>10</sub>(Final reads)", x)) %>%
        column_to_rownames(var = "x") %>% 
        mutate("Effect size (95% CI)" = 
                       paste(round(Estimate, digits = 1) %>%
                                     format(nsmall = 1),
                             conf, 
                             sep = " "),
               "<i>p</i>-value" = round(`<i>p</i>-value`, 3)) %>%
        dplyr::select(c("Effect size (95% CI)", "<i>p</i>-value", " ")) %>%
        .[c("(Intercept)",
                  "lyPMA",
                  "Benzonase",
                  "HostZERO",
                  "MolYsis",
                  "QIAamp"),]
        

tables3 <- cbind(#sr_lmer_mock_kbl,
                 sr_lmer_bal_kbl,
                 sr_lmer_ns_kbl,
                 sr_lmer_spt_kbl) %>%
    kbl(format = "html", escape = 0) %>%
        add_header_above(c(" " = 1,
                           #"Mock" = 3,
                           "BAL" = 3, "Nasal swab" = 3,"Sputum"= 3)) %>% 
        kable_styling(full_width = 0, html_font = "sans")

tables3
BAL
Nasal swab
Sputum
Effect size (95% CI) p-value Effect size (95% CI) p-value Effect size (95% CI) p-value
(Intercept) 4.9 (-7.3, 17.2) 0.465 10.4 (7.8, 13.0) 0.000 *** 15.8 (-5.0, 36.6) 0.172
lyPMA 1.7 (-10.7, 14.2) 0.802 -4.8 (-9.2, -0.4) 0.056 37.6 (19.5, 55.7) 0.001 **
Benzonase 5.7 (-6.2, 17.6) 0.399 -0.4 (-4.8, 4.1) 0.881 66.6 (48.5, 84.7) 0.000 ***
HostZERO 8.9 (-3.0, 20.8) 0.194 10.0 (5.6, 14.5) 0.000 *** 103.0 (84.9, 121.1) 0.000 ***
MolYsis 18.9 (7.0, 30.8) 0.010
6.2 (1.8, 10.6) 0.016
112.8 (94.7, 130.9) 0.000 ***
QIAamp 9.5 (-2.4, 21.4) 0.167 7.8 (3.3, 12.2) 0.003 ** 85.2 (67.1, 103.3) 0.000 ***
save_kable(tables3, file = "Project_SICAS2_microbiome/7_Manuscripts/2022_MGK_Host_Depletion/Figures/tableS3.html", self_contained = T)

Table S3 viral. Effect size, standard error (SE) and p-value of a statistical test for viral species richness using linear mixed effect model stratified by sample type (Species richness ~ treatment + (1|subject_id) ). Stratified analyses were conducted for each sample type as an interaction term of sample type and treatment was significant at an ANOVA test (p-value < 0.001) using a model, lmer(species richness ~ sample type + treatment + sample type * treatment + (1|subject_id)). Statistical significances were noted with *: p-value < 0.05, **: p-value < 0.01, and ***: p-value < 0.001.

LMER raw result - BAL

sr_lmer_bal_v <- lmerTest::lmer(V.obs ~ treatment + (1|subject_id),
                    data = sample_data %>% 
                            data.frame %>% 
                           subset(., .$sample_type %in% c("BAL")))
sr_lmer_bal_v %>% summary()
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: V.obs ~ treatment + (1 | subject_id)
##    Data: sample_data %>% data.frame %>% subset(., .$sample_type %in% c("BAL"))
## 
## REML criterion at convergence: 197.7
## 
## Scaled residuals: 
##      Min       1Q   Median       3Q      Max 
## -1.72110 -0.48044 -0.00556  0.48162  2.20074 
## 
## Random effects:
##  Groups     Name        Variance Std.Dev.
##  subject_id (Intercept) 113.3    10.64   
##  Residual               106.1    10.30   
## Number of obs: 30, groups:  subject_id, 5
## 
## Fixed effects:
##                    Estimate Std. Error     df t value Pr(>|t|)  
## (Intercept)           0.200      6.624 10.288   0.030   0.9765  
## treatmentlyPMA        0.400      6.515 20.000   0.061   0.9517  
## treatmentBenzonase    5.600      6.515 20.000   0.860   0.4002  
## treatmentHostZERO     6.600      6.515 20.000   1.013   0.3231  
## treatmentMolYsis     15.000      6.515 20.000   2.302   0.0322 *
## treatmentQIAamp      10.600      6.515 20.000   1.627   0.1194  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr) trtPMA trtmnB tHZERO trtmMY
## trtmntlyPMA -0.492                            
## trtmntBnzns -0.492  0.500                     
## trtmntHZERO -0.492  0.500  0.500              
## trtmntMlYss -0.492  0.500  0.500  0.500       
## trtmntQIAmp -0.492  0.500  0.500  0.500  0.500

LMER raw result - Nasal

sr_lmer_ns_v <- lmerTest::lmer(V.obs ~ treatment + (1|subject_id),
                   data = sample_data %>% 
                           data.frame %>% 
                           subset(., .$sample_type %in% c("Nasal")))
sr_lmer_ns_v %>% summary()
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: V.obs ~ treatment + (1 | subject_id)
##    Data: 
## sample_data %>% data.frame %>% subset(., .$sample_type %in% c("Nasal"))
## 
## REML criterion at convergence: 187
## 
## Scaled residuals: 
##      Min       1Q   Median       3Q      Max 
## -1.36349 -0.67373  0.06339  0.55537  1.43318 
## 
## Random effects:
##  Groups     Name        Variance Std.Dev.
##  subject_id (Intercept) 16.61    4.075   
##  Residual               16.70    4.086   
## Number of obs: 35, groups:  subject_id, 10
## 
## Fixed effects:
##                    Estimate Std. Error      df t value Pr(>|t|)    
## (Intercept)          9.6000     1.8249 18.4329   5.261 4.91e-05 ***
## treatmentlyPMA      -6.4476     2.3678 22.2608  -2.723   0.0123 *  
## treatmentBenzonase  -3.8718     2.3780 22.5323  -1.628   0.1174    
## treatmentHostZERO    4.1282     2.3780 22.5323   1.736   0.0962 .  
## treatmentMolYsis    -0.1524     2.3678 22.2608  -0.064   0.9492    
## treatmentQIAamp      4.0718     2.3780 22.5323   1.712   0.1006    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr) trtPMA trtmnB tHZERO trtmMY
## trtmntlyPMA -0.386                            
## trtmntBnzns -0.385  0.216                     
## trtmntHZERO -0.385  0.216  0.409              
## trtmntMlYss -0.386  0.191  0.377  0.377       
## trtmntQIAmp -0.385  0.377  0.181  0.181  0.216

LMER raw result - Sputum

sr_lmer_spt_v <- lmerTest::lmer(V.obs ~ treatment + (1|subject_id),
                    data = sample_data %>% 
                            data.frame %>% 
                            subset(., .$sample_type %in% c("Sputum")))
sr_lmer_spt_v %>% summary()
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: V.obs ~ treatment + (1 | subject_id)
##    Data: 
## sample_data %>% data.frame %>% subset(., .$sample_type %in% c("Sputum"))
## 
## REML criterion at convergence: 235.1
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -1.9580 -0.5365 -0.1189  0.4298  1.8857 
## 
## Random effects:
##  Groups     Name        Variance Std.Dev.
##  subject_id (Intercept) 579.2    24.07   
##  Residual               804.2    28.36   
## Number of obs: 29, groups:  subject_id, 5
## 
## Fixed effects:
##                    Estimate Std. Error      df t value Pr(>|t|)    
## (Intercept)           9.193     17.977  14.856   0.511 0.616615    
## treatmentlyPMA        4.207     19.188  19.086   0.219 0.828770    
## treatmentBenzonase   19.207     19.188  19.086   1.001 0.329340    
## treatmentHostZERO    91.807     19.188  19.086   4.785 0.000127 ***
## treatmentMolYsis    118.407     19.188  19.086   6.171 6.13e-06 ***
## treatmentQIAamp      46.607     19.188  19.086   2.429 0.025183 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr) trtPMA trtmnB tHZERO trtmMY
## trtmntlyPMA -0.601                            
## trtmntBnzns -0.601  0.563                     
## trtmntHZERO -0.601  0.563  0.563              
## trtmntMlYss -0.601  0.563  0.563  0.563       
## trtmntQIAmp -0.601  0.563  0.563  0.563  0.563

Tidy summarized table

sr_lmer_bal_kbl_v <-  sr_lmer_bal_v %>%
        summary() %>%
        .$coefficients %>%
        data.frame(check.names = F) %>% 
        mutate(` ` = case_when(abs(`Pr(>|t|)`) < 0.001 ~ "***",
                               abs(`Pr(>|t|)`) < 0.01 ~ "**",
                               abs(`Pr(>|t|)`) < 0.05 ~ "*",
                               .default = " ")) %>% 
        rename("<i>p</i>-value" = "Pr(>|t|)",
               t_value = "t value") %>%
        merge(., 
              sr_lmer_bal %>% confint() %>%
                        round(digits = 1) %>% 
                        format(nsmall = 1) %>%
                        as.data.frame() %>%
                        mutate(conf = paste("(",
                                            gsub(" ","", .[,1]),
                                            ", ",
                                            gsub(" ","", .[,2]),
                                            ")",
                                            sep = "")),
              by = 0
              ) %>%
        column_to_rownames("Row.names") %>%
        rownames_to_column(var = "x") %>% mutate(x = gsub("treatment|sample_type", "", x)) %>% mutate(x = gsub(":", " * ", x)) %>%
        mutate(x = gsub("bal_log_centered_final_reads", "log<sub>10</sub>(Final reads)", x)) %>%
        column_to_rownames(var = "x") %>% 
        mutate("Effect size (95% CI)" = 
                       paste(round(Estimate, digits = 1) %>%
                                     format(nsmall = 1),
                             conf, 
                             sep = " "),
               "<i>p</i>-value" = round(`<i>p</i>-value`, 3)) %>%
        dplyr::select(c("Effect size (95% CI)", "<i>p</i>-value", " ")) %>%
        .[c("(Intercept)",
                  "lyPMA",
                  "Benzonase",
                  "HostZERO",
                  "MolYsis",
                  "QIAamp"),]
        
sr_lmer_ns_kbl_v <-  sr_lmer_ns_v %>%
        summary() %>%
        .$coefficients %>%
        data.frame(check.names = F) %>% 
        mutate(` ` = case_when(abs(`Pr(>|t|)`) < 0.001 ~ "***",
                               abs(`Pr(>|t|)`) < 0.01 ~ "**",
                               abs(`Pr(>|t|)`) < 0.05 ~ "*",
                               .default = " ")) %>% 
        rename("<i>p</i>-value" = "Pr(>|t|)",
               t_value = "t value") %>%
        merge(., 
              sr_lmer_ns %>% confint() %>%
                        round(digits = 1) %>% 
                        format(nsmall = 1) %>%
                        as.data.frame() %>%
                        mutate(conf = paste("(",
                                            gsub(" ","", .[,1]),
                                            ", ",
                                            gsub(" ","", .[,2]),
                                            ")",
                                            sep = "")),
              by = 0
              ) %>%
        column_to_rownames("Row.names") %>%
        rownames_to_column(var = "x") %>% mutate(x = gsub("treatment|sample_type", "", x)) %>% mutate(x = gsub(":", " * ", x)) %>%
        mutate(x = gsub("bal_log_centered_final_reads", "log<sub>10</sub>(Final reads)", x)) %>%
        column_to_rownames(var = "x") %>% 
        mutate("Effect size (95% CI)" = 
                       paste(round(Estimate, digits = 1) %>%
                                     format(nsmall = 1),
                             conf, 
                             sep = " "),
               "<i>p</i>-value" = round(`<i>p</i>-value`, 3)) %>%
        dplyr::select(c("Effect size (95% CI)", "<i>p</i>-value", " ")) %>%
        .[c("(Intercept)",
                  "lyPMA",
                  "Benzonase",
                  "HostZERO",
                  "MolYsis",
                  "QIAamp"),]


sr_lmer_spt_kbl_v <-  sr_lmer_spt_v %>%
        summary() %>%
        .$coefficients %>%
        data.frame(check.names = F) %>% 
        mutate(` ` = case_when(abs(`Pr(>|t|)`) < 0.001 ~ "***",
                               abs(`Pr(>|t|)`) < 0.01 ~ "**",
                               abs(`Pr(>|t|)`) < 0.05 ~ "*",
                               .default = " ")) %>% 
        rename("<i>p</i>-value" = "Pr(>|t|)",
               t_value = "t value") %>%
        merge(., 
              sr_lmer_spt %>% confint() %>%
                        round(digits = 1) %>% 
                        format(nsmall = 1) %>%
                        as.data.frame() %>%
                        mutate(conf = paste("(",
                                            gsub(" ","", .[,1]),
                                            ", ",
                                            gsub(" ","", .[,2]),
                                            ")",
                                            sep = "")),
              by = 0
              ) %>%
        column_to_rownames("Row.names") %>%
        rownames_to_column(var = "x") %>% mutate(x = gsub("treatment|sample_type", "", x)) %>% mutate(x = gsub(":", " * ", x)) %>%
        mutate(x = gsub("bal_log_centered_final_reads", "log<sub>10</sub>(Final reads)", x)) %>%
        column_to_rownames(var = "x") %>% 
        mutate("Effect size (95% CI)" = 
                       paste(round(Estimate, digits = 1) %>%
                                     format(nsmall = 1),
                             conf, 
                             sep = " "),
               "<i>p</i>-value" = round(`<i>p</i>-value`, 3)) %>%
        dplyr::select(c("Effect size (95% CI)", "<i>p</i>-value", " ")) %>%
        .[c("(Intercept)",
                  "lyPMA",
                  "Benzonase",
                  "HostZERO",
                  "MolYsis",
                  "QIAamp"),]
        
 
tables3 <- 
        
        rbind(
                cbind(Outcome = "Microbial species richness",
                      sr_lmer_bal_kbl %>% rownames_to_column("Treatment"), 
                      sr_lmer_ns_kbl, 
                      sr_lmer_spt_kbl) %>% remove_rownames() %>% as.matrix(),
                cbind(Outcome = "Viral species richness",
                      sr_lmer_bal_kbl_v %>% rownames_to_column("Treatment"), 
                      sr_lmer_ns_kbl_v, 
                      sr_lmer_spt_kbl_v) %>% remove_rownames() %>% as.matrix()
                ) %>%
    kbl(format = "html", escape = 0) %>%
        add_header_above(c(" " = 2,
                           #"Mock" = 3,
                           "BAL" = 3, "Nasal swab" = 3,"Sputum"= 3)) %>% 
        kable_styling(full_width = 0, html_font = "sans")

tables3
BAL
Nasal swab
Sputum
Outcome Treatment Effect size (95% CI) p-value Effect size (95% CI) p-value Effect size (95% CI) p-value
Microbial species richness (Intercept) 4.9 (-7.3, 17.2) 0.465 10.4 (7.8, 13.0) 0.000 *** 15.8 (-5.0, 36.6) 0.172
Microbial species richness lyPMA 1.7 (-10.7, 14.2) 0.802 -4.8 (-9.2, -0.4) 0.056 37.6 (19.5, 55.7) 0.001 **
Microbial species richness Benzonase 5.7 (-6.2, 17.6) 0.399 -0.4 (-4.8, 4.1) 0.881 66.6 (48.5, 84.7) 0.000 ***
Microbial species richness HostZERO 8.9 (-3.0, 20.8) 0.194 10.0 (5.6, 14.5) 0.000 *** 103.0 (84.9, 121.1) 0.000 ***
Microbial species richness MolYsis 18.9 (7.0, 30.8) 0.010
6.2 (1.8, 10.6) 0.016
112.8 (94.7, 130.9) 0.000 ***
Microbial species richness QIAamp 9.5 (-2.4, 21.4) 0.167 7.8 (3.3, 12.2) 0.003 ** 85.2 (67.1, 103.3) 0.000 ***
Viral species richness (Intercept) 0.2 (-7.3, 17.2) 0.976 9.6 (7.8, 13.0) 0.000 *** 9.2 (-5.0, 36.6) 0.617
Viral species richness lyPMA 0.4 (-10.7, 14.2) 0.952 -6.4 (-9.2, -0.4) 0.012
4.2 (19.5, 55.7) 0.829
Viral species richness Benzonase 5.6 (-6.2, 17.6) 0.400 -3.9 (-4.8, 4.1) 0.117 19.2 (48.5, 84.7) 0.329
Viral species richness HostZERO 6.6 (-3.0, 20.8) 0.323 4.1 (5.6, 14.5) 0.096 91.8 (84.9, 121.1) 0.000 ***
Viral species richness MolYsis 15.0 (7.0, 30.8) 0.032
-0.2 (1.8, 10.6) 0.949 118.4 (94.7, 130.9) 0.000 ***
Viral species richness QIAamp 10.6 (-2.4, 21.4) 0.119 4.1 (3.3, 12.2) 0.101 46.6 (67.1, 103.3) 0.025
save_kable(tables3, file = "Project_SICAS2_microbiome/7_Manuscripts/2022_MGK_Host_Depletion/Figures/tableS3_updated.html", self_contained = T)

Species richness - stratified - with depth

This table is too redundant. Removed from the manuscript..

Sequencing depth adjusted effect size, standard error (SE) and p-value of a statistical test for species richness using linear mixed effect model stratified by sample type (Species richness ~ treatment + log10 (Final_reads) + (1|subject_id) ).

sr_lmer_bal_w_depth <- lmer(S.obs ~ treatment + bal_log_centered_final_reads + (1|subject_id), data = sample_data %>% data.frame %>% subset(., .$sample_type %in% c("BAL") & S.obs != 0)) %>% 
        summary() %>%
        .$coefficients %>%
        data.frame(check.names = F) %>% 
        mutate(` ` = case_when(abs(`Pr(>|t|)`) < 0.001 ~ "***",
                               abs(`Pr(>|t|)`) < 0.01 ~ "**",
                               abs(`Pr(>|t|)`) < 0.05 ~ "*",
                               .default = " ")) %>% 
        rownames_to_column(var = "x") %>% mutate(x = gsub("treatment|sample_type", "", x)) %>% mutate(x = gsub(":", " * ", x)) %>%
        mutate(x = gsub("bal_log_centered_final_reads", "log<sub>10</sub>(Final reads)", x)) %>%
        column_to_rownames(var = "x") %>% 
        rename("<i>p</i>-value" = "Pr(>|t|)",
               t_value = "t value") %>%
        mutate("Effect size (95% CI)" = paste(round(Estimate, 1) %>% format(nsmall = 1), 
                                " (", 
                                round(Estimate - 1.96 * abs(t_value), 1) %>% format(nsmall = 1),
                                ", ",
                                round(Estimate + 1.96 * abs(t_value), 1) %>% format(nsmall = 1),
                                ")", 
                                sep = ""),
               "<i>p</i>-value" = round(`<i>p</i>-value`, 3)) %>%
        dplyr::select(c("Effect size (95% CI)", "<i>p</i>-value", " "))
        
sr_lmer_ns_w_depth <- lmer(S.obs ~ treatment + ns_log_centered_final_reads + (1|subject_id), data = sample_data %>% data.frame %>% subset(., .$sample_type %in% c("Nasal"))) %>% 
        summary() %>%
        .$coefficients %>%
        data.frame(check.names = F) %>% 
        mutate(` ` = case_when(abs(`Pr(>|t|)`) < 0.001 ~ "***",
                               abs(`Pr(>|t|)`) < 0.01 ~ "**",
                               abs(`Pr(>|t|)`) < 0.05 ~ "*",
                               .default = " ")) %>% 
        rownames_to_column(var = "x") %>% mutate(x = gsub("treatment|sample_type", "", x)) %>% mutate(x = gsub(":", " * ", x)) %>%
        mutate(x = gsub("ns_log_centered_final_reads", "log<sub>10</sub>(Final reads)", x)) %>%
        column_to_rownames(var = "x") %>% 
        rename("<i>p</i>-value" = "Pr(>|t|)",
               t_value = "t value") %>%
        mutate("Effect size (95% CI)" = paste(round(Estimate, 1) %>% format(nsmall = ), 
                                " (", 
                                round(Estimate - 1.96 * abs(t_value), 1) %>% format(nsmall = ),
                                ", ",
                                round(Estimate + 1.96 * abs(t_value), 1) %>% format(nsmall = ),
                                ")", 
                                sep = ""),
               "<i>p</i>-value" = round(`<i>p</i>-value`, 3)) %>%
        dplyr::select(c("Effect size (95% CI)", "<i>p</i>-value", " "))


sr_lmer_spt_w_depth <- lmer(S.obs ~ treatment + spt_log_centered_final_reads + (1|subject_id), data = sample_data %>% data.frame %>% subset(., .$sample_type %in% c("Sputum"))) %>% 
        summary() %>%
        .$coefficients %>%
        data.frame(check.names = F) %>%
        mutate(` ` = case_when(abs(`Pr(>|t|)`) < 0.001 ~ "***",
                               abs(`Pr(>|t|)`) < 0.01 ~ "**",
                               abs(`Pr(>|t|)`) < 0.05 ~ "*",
                               .default = " ")) %>% 
        rownames_to_column(var = "x") %>% mutate(x = gsub("treatment|sample_type", "", x)) %>% mutate(x = gsub(":", " * ", x)) %>%
        mutate(x = gsub("spt_log_centered_final_reads", "log<sub>10</sub>(Final reads)", x)) %>%
        column_to_rownames(var = "x") %>% 
                rename("<i>p</i>-value" = "Pr(>|t|)",
               t_value = "t value") %>%
        mutate("Effect size (95% CI)" = paste(round(Estimate, 1) %>% format(nsmall = 1), 
                                " (", 
                                round(Estimate - 1.96 * abs(t_value), 1) %>% format(nsmall = 1),
                                ", ",
                                round(Estimate + 1.96 * abs(t_value), 1) %>% format(nsmall = 1),
                                ")", 
                                sep = ""),
               "<i>p</i>-value" = round(`<i>p</i>-value`, 3)) %>%
        dplyr::select(c("Effect size (95% CI)", "<i>p</i>-value", " "))
        

cbind(sr_lmer_bal_w_depth, sr_lmer_ns_w_depth, sr_lmer_spt_w_depth) %>%
    kbl(format = "html", escape = 0) %>%
        add_header_above(c(" " = 1, "BAL" = 3, "Nasal swab" = 3,"Sputum"= 3)) %>% 
        kable_styling(full_width = 0, html_font = "sans")
BAL
Nasal swab
Sputum
Effect size (95% CI) p-value Effect size (95% CI) p-value Effect size (95% CI) p-value
(Intercept) 6.2 ( 3.7, 8.6) 0.230 11.1 (-10.6, 32.7) 0.000 *** 14.2 (11.4, 17.0) 0.190
lyPMA -4.0 (-5.5, -2.5) 0.457 -1.1 ( -2.4, 0.2) 0.511 19.4 (16.2, 22.7) 0.113
Benzonase -6.8 (-9.1, -4.5) 0.248 -1.2 ( -2.8, 0.3) 0.432 38.2 (33.1, 43.2) 0.018
HostZERO -5.6 (-7.5, -3.8) 0.358 4.3 ( -0.3, 8.8) 0.028
46.7 (43.1, 50.4) 0.077
MolYsis 3.2 ( 2.1, 4.2) 0.612 4.8 ( -1.2, 10.8) 0.006 ** 45.9 (42.8, 49.0) 0.132
QIAamp -6.4 (-8.5, -4.4) 0.309 0.4 ( 0.0, 0.9) 0.827 37.6 (34.2, 41.0) 0.099
log10(Final reads) 6.2 (-2.2, 14.7) 0.000 *** 3.0 ( -8.9, 14.8) 0.000 *** 14.6 ( 9.9, 19.3) 0.026
#save_kable(tableS7, file = "Project_SICAS2_microbiome/7_Manuscripts/2022_MGK_Host_Depletion/Figures/tableS7.html", self_contained = T)

Beta diversity distances - Morisita Horn dissimilarity

PERMANOVA - all samples

This table is too redundant. Removed from the manuscript..

Degree of freedom, effect size (residual, R2) and p-value of permutational ANOVA for Morisita-Horn dissimilarity with an interaction term and strata term (MH-index of species composition ~ sample type * treatment + subject + log10(final reads), strata = subject id). Statistical significances were noted with ***: p-value < 0.001.

set.seed(seed)

phyloseq_rel_nz_neg <- phyloseq$phyloseq_rel %>%
        subset_samples(S.obs != 0 & sample_type %in% c("Neg.", "Mock", 
                                                       "BAL", "Nasal", "Sputum"))

horn_perm_all_neg <- vegan::adonis2(by = "terms",
               distance(phyloseq_rel_nz_neg %>%
                                subset_samples(), method="horn") ~ bal + ns + spt + mock,
               data = phyloseq_rel_nz_neg %>% sample_data %>% data.frame(check.names = F) %>%
                       mutate(neg = ifelse(sample_type == "Neg.", 1, 0),
                              bal = ifelse(sample_type == "BAL", 1, 0),
                              ns = ifelse(sample_type == "Nasal", 1, 0),
                              spt = ifelse(sample_type == "Sputum", 1, 0),
                              mock = ifelse(sample_type == "Mock", 1, 0)),
               permutations = 10000)

set.seed(seed)

phyloseq_rel_nz_neg_decontam <- phyloseq_decontam$phyloseq_rel %>%
        subset_samples(S.obs != 0 & sample_type %in% c("Neg.", "Mock", 
                                                       "BAL", "Nasal", "Sputum"))

horn_perm_all_neg_decontam <- vegan::adonis2(by = "terms",
               distance(phyloseq_rel_nz_neg_decontam %>%
                                subset_samples(), method="horn") ~ bal + ns + spt + mock,
               data = phyloseq_rel_nz_neg_decontam %>% sample_data %>% data.frame(check.names = F) %>%
                       mutate(neg = ifelse(sample_type == "Neg.", 1, 0),
                              bal = ifelse(sample_type == "BAL", 1, 0),
                              ns = ifelse(sample_type == "Nasal", 1, 0),
                              spt = ifelse(sample_type == "Sputum", 1, 0),
                              mock = ifelse(sample_type == "Mock", 1, 0)),
               permutations = 10000)


set.seed(seed)
horn_perm_all <- vegan::adonis2(by = "terms",
                                distance(phyloseq_rel_nz, method="horn") ~ sample_type * treatment + subject_id,
                                  data = phyloseq_rel_nz %>% sample_data %>% data.frame(check.names = F),
                                  permutations = 10000)
set.seed(seed)
horn_perm_ns <- vegan::adonis2(by = "terms", distance(subset_samples(phyloseq_rel_nz, sample_type == "Nasal"), method="horn") ~ lypma + benzonase + host_zero + molysis + qiaamp,
                               data = subset_samples(phyloseq_rel_nz, sample_type == "Nasal") %>%
                                       sample_data %>% data.frame(check.names = F),
                               strata = subset_samples(phyloseq_rel_nz, sample_type == "Nasal") %>% 
                                       sample_data %>% data.frame(check.names = F) %>% .$subject_id, permutations = 10000)
set.seed(seed)
horn_perm_bal  <- vegan::adonis2(by = "terms", distance(subset_samples(phyloseq_rel_nz, sample_type == "BAL"), method="horn") ~  lypma + benzonase + host_zero + molysis + qiaamp,
                                 data = subset_samples(phyloseq_rel_nz, sample_type == "BAL") %>% sample_data %>% data.frame(check.names = F),
                                 strata = subset_samples(phyloseq_rel_nz, sample_type == "BAL") %>%
                                         sample_data %>% data.frame(check.names = F) %>% .$subject_id,
                                  permutations = 10000)
set.seed(seed)
horn_perm_spt <- vegan::adonis2(by = "terms", distance(subset_samples(phyloseq_rel_nz, sample_type == "Sputum"), method="horn") ~ lypma + benzonase + host_zero + molysis + qiaamp,
                                data = subset_samples(phyloseq_rel_nz, sample_type == "Sputum") %>% sample_data %>% data.frame(check.names = F),
                                strata = subset_samples(phyloseq_rel_nz, sample_type == "Sputum")
                                %>% sample_data %>% data.frame(check.names = F) %>% .$subject_id,
                                  permutations = 10000) 

PERMANOVA result

PERM(Morisita-Horn dissimilarity ~ bal + ns + spt + mock), negative controls as reference group.

all the sample types were distinctive.

horn_perm_all_neg

PERMANOVA result after removing potential contaminants (identified from decontam)

PERM(Morisita-Horn dissimilarity ~ bal + ns + spt + mock), negative controls as reference group.

all the sample types were distinctive.

horn_perm_all_neg_decontam

PERMANOVA result without negative control samples

horn_perm_all 

Tidy permanova result

tableS5_A <- horn_perm_all %>% data.frame(check.names = F) %>% rownames_to_column("row.names") %>% 
        mutate(row.names = case_when(row.names == "sample_type" ~ 'Sample type',
                                     row.names == "treatment" ~ 'Treatment',
                                     row.names == "subject_id" ~ 'Subject',
                                     row.names == "log10(Final_reads)" ~ 'log10(Final reads)',
                                     row.names == "sample_type:treatment" ~ 'Sample type * Treatment',
                                     row.names == "Residual" ~ 'Residual',
                                     row.names == "Total" ~ 'Total')) %>% column_to_rownames('row.names') %>% 
        mutate(` ` = case_when(abs(`Pr(>F)`) < 0.001 ~ "***",
                                            abs(`Pr(>F)`) < 0.01 ~ "**",
                                            abs(`Pr(>F)`) < 0.05 ~ "*",
                               .default = " ")) %>% 
        mutate(across(is.numeric, round, digits=3)) %>% 
        rename("<i>p</i>-value" = "Pr(>F)",
               "R<sup>2</sup>" = "R2",
               "Degree of freedom" = "Df") %>% 
        mutate(`<i>p</i>-value` = format(`<i>p</i>-value`, nsmall = 3)) %>%
        dplyr::select(c("Degree of freedom", "R<sup>2</sup>", "<i>p</i>-value", " ")) %>% 
        kbl(format = "html", escape = 0) %>%
        kable_styling(full_width = 0, html_font = "sans")

tableS5_A
Degree of freedom R2 p-value
Sample type 3 0.528 0.000 ***
Treatment 5 0.031 0.000 ***
Subject 17 0.347 0.000 ***
Sample type * Treatment 15 0.052 0.000 ***
Residual 83 0.042 NA
Total 123 1.000 NA

Table S6. PERMANOVA - Stratified

Raw result - BAL

horn_perm_bal

Raw result - Nasal

horn_perm_ns

Raw result - BAL

horn_perm_spt

This table was additionally added to the manuscript as to determine bias after treatment with sputum.

Table S6. Degree of freedom, effect size (residual, R^2) and p-value of permutational ANOVA for Morisita-Horn distiances for species richness (MH-distance ~ lyPMA + Benzoase + HostZERO + MolYsis + QIAamp, strata = subject_id). Sample types were stratified as interaction term of (MH-distance ~ sample type + treatment + sample type * treatment, strata = subject_id) was significant (p < 0.001).

Tidy permanova result (stratified)

a <- horn_perm_bal %>% data.frame(check.names = F) %>% rownames_to_column('row.names') %>% 
        mutate(row.names = case_when(row.names == "lypma" ~ 'lyPMA',
                                     row.names == "benzonase" ~ 'Benzonase',
                                     row.names == "host_zero" ~ 'HostZERO',
                                     row.names == "molysis" ~ 'MolYsis',
                                     row.names == "qiaamp" ~ 'QIAamp',
                                     row.names == "subject_id" ~ 'Subject id',
                                     row.names == "log10(Final_reads)" ~ 'log10(Final reads)',
                                     row.names == "Residual" ~ 'Residual',
                                     row.names == "Total" ~ 'Total')) %>% column_to_rownames('row.names') %>% 
                mutate(` ` = case_when(abs(`Pr(>F)`) < 0.001 ~ "***",
                                            abs(`Pr(>F)`) < 0.01 ~ "**",
                                            abs(`Pr(>F)`) < 0.05 ~ "*",
                               .default = " ")) %>% 
        mutate(across(is.numeric, round, digits=3)) %>% 
        rename("<i>p</i>-value" = "Pr(>F)",
               "R<sup>2</sup>" = "R2",
               "Degree of freedom" = "Df") %>% 
        dplyr::select(c("R<sup>2</sup>", "<i>p</i>-value", " ")) 

b <- horn_perm_ns %>% data.frame(check.names = F) %>% rownames_to_column('row.names') %>% 
        mutate(row.names = case_when(row.names == "lypma" ~ 'lyPMA',
                                     row.names == "benzonase" ~ 'Benzonase',
                                     row.names == "host_zero" ~ 'HostZERO',
                                     row.names == "molysis" ~ 'MolYsis',
                                     row.names == "qiaamp" ~ 'QIAamp',
                                     row.names == "subject_id" ~ 'Subject id',
                                     row.names == "log10(Final_reads)" ~ 'log10(Final reads)',
                                     row.names == "Residual" ~ 'Residual',
                                     row.names == "Total" ~ 'Total')) %>% column_to_rownames('row.names') %>% 
                mutate(` ` = case_when(abs(`Pr(>F)`) < 0.001 ~ "***",
                                            abs(`Pr(>F)`) < 0.01 ~ "**",
                                            abs(`Pr(>F)`) < 0.05 ~ "*",
                               .default = " ")) %>% 
        mutate(across(is.numeric, round, digits=3)) %>% 
        rename("<i>p</i>-value" = "Pr(>F)",
               "R<sup>2</sup>" = "R2",
               "Degree of freedom" = "Df") %>% 
        dplyr::select(c("R<sup>2</sup>", "<i>p</i>-value", " ")) 

c <- horn_perm_spt %>% data.frame(check.names = F) %>% rownames_to_column('row.names') %>% 
        mutate(row.names = case_when(row.names == "lypma" ~ 'lyPMA',
                                     row.names == "benzonase" ~ 'Benzonase',
                                     row.names == "host_zero" ~ 'HostZERO',
                                     row.names == "molysis" ~ 'MolYsis',
                                     row.names == "qiaamp" ~ 'QIAamp',
                                     row.names == "subject_id" ~ 'Subject id',
                                     row.names == "log10(Final_reads)" ~ 'log10(Final reads)',
                                     row.names == "Residual" ~ 'Residual',
                                     row.names == "Total" ~ 'Total')) %>% column_to_rownames('row.names') %>% 
                mutate(` ` = case_when(abs(`Pr(>F)`) < 0.001 ~ "***",
                                            abs(`Pr(>F)`) < 0.01 ~ "**",
                                            abs(`Pr(>F)`) < 0.05 ~ "*",
                               .default = " ")) %>% 
        mutate(across(is.numeric, round, digits=3)) %>% 
        rename("<i>p</i>-value" = "Pr(>F)",
               "R<sup>2</sup>" = "R2",
               "Degree of freedom" = "Df") %>% 
        dplyr::select(c("R<sup>2</sup>", "<i>p</i>-value", " ")) 


tableS6 <- cbind(a, b, c) %>% 
        kbl(format = "html", escape = 0) %>%
        add_header_above(c(" " = 1, "BAL" = 3, "Nasal swab" = 3,"Sputum"= 3)) %>% 
        kable_styling(full_width = 0, html_font = "sans")


save_kable(tableS6, file = "Project_SICAS2_microbiome/7_Manuscripts/2022_MGK_Host_Depletion/Figures/tableS6.html", self_contained = T)


tableS6
BAL
Nasal swab
Sputum
R2 p-value R2 p-value R2 p-value
lyPMA 0.006 0.302 0.058 0.004 ** 0.011 0.354
Benzonase 0.001 0.945 0.031 0.254 0.003 0.661
HostZERO 0.006 0.435 0.029 0.471 0.025 0.122
MolYsis 0.007 0.403 -0.005 0.975 0.055 0.023
QIAamp 0.014 0.039
0.053 0.030
0.144 0.000 ***
Residual 0.966 NA 0.835 NA 0.762 NA
Total 1.000 NA 1.000 NA 1.000 NA

Table S7. LM on M-H distance

Table S7. Effect size, standard error (SE) and p-value of a statistical test for Morisita-Horn dissimilarity from untreated to each treated within subject, stratified by sample type. Stratified analyses were conducted for each sample type as an interaction term of sample type and treatment was significant (p-value < 0.001) using a model, ANOVA(species richness ~ sample type + treatment + sample type * treatment). The tested was conducted with a linear mixed effect model LM(distance within subject by treatment ~ treatment + (1|subject)). The baseline of sample type is untreated (0 distance, untreated from untreated), and statistical significance were noted with *: p-value < 0.05, **: p-value < 0.01 and ***:p-value < 0.001.

horn_dist_long_within_sampleid_from_control$treatment <- factor(horn_dist_long_within_sampleid_from_control$treatment,
                                                                     levels = c("Untreated",
                                                                                "lypma",
                                                                                "benzonase",
                                                                                "host_zero",
                                                                                "molysis",
                                                                                "qiaamp"))
horn_dist_long_within_sampleid_from_control$sample_type <- factor(horn_dist_long_within_sampleid_from_control$sample_type,
                                                                     levels = c("BAL",
                                                                                "Nasal",
                                                                                "Sputum"))

Raw lmer result on M-H distances

lmer(dist ~ treatment * sample_type + (1|subject_id),
     data = horn_dist_long_within_sampleid_from_control %>%
             subset(sample_type != "Mock")) %>%
        summary() 
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: dist ~ treatment * sample_type + (1 | subject_id)
##    Data: horn_dist_long_within_sampleid_from_control %>% subset(sample_type !=  
##     "Mock")
## 
## REML criterion at convergence: -4.5
## 
## Scaled residuals: 
##      Min       1Q   Median       3Q      Max 
## -2.20682 -0.58433 -0.06776  0.47243  2.61620 
## 
## Random effects:
##  Groups     Name        Variance Std.Dev.
##  subject_id (Intercept) 0.02288  0.1512  
##  Residual               0.02599  0.1612  
## Number of obs: 89, groups:  subject_id, 19
## 
## Fixed effects:
##                                        Estimate Std. Error         df t value
## (Intercept)                           8.581e-16  1.105e-01  4.166e+01   0.000
## treatmentlypma                        3.252e-01  1.140e-01  5.672e+01   2.852
## treatmentbenzonase                    1.067e-01  1.140e-01  5.672e+01   0.936
## treatmenthost_zero                    2.736e-01  1.140e-01  5.672e+01   2.400
## treatmentmolysis                      2.039e-01  1.140e-01  5.672e+01   1.789
## treatmentqiaamp                       2.508e-01  1.140e-01  5.672e+01   2.200
## sample_typeNasal                     -8.635e-16  1.308e-01  4.166e+01   0.000
## sample_typeSputum                    -1.056e-15  1.483e-01  4.166e+01   0.000
## treatmentlypma:sample_typeNasal      -1.455e-01  1.472e-01  5.885e+01  -0.988
## treatmentbenzonase:sample_typeNasal   3.519e-02  1.475e-01  5.912e+01   0.239
## treatmenthost_zero:sample_typeNasal  -2.428e-01  1.475e-01  5.912e+01  -1.646
## treatmentmolysis:sample_typeNasal     1.194e-02  1.472e-01  5.885e+01   0.081
## treatmentqiaamp:sample_typeNasal     -9.850e-02  1.475e-01  5.912e+01  -0.668
## treatmentlypma:sample_typeSputum      2.215e-02  1.529e-01  5.672e+01   0.145
## treatmentbenzonase:sample_typeSputum  4.126e-01  1.529e-01  5.672e+01   2.697
## treatmenthost_zero:sample_typeSputum  3.383e-01  1.529e-01  5.672e+01   2.212
## treatmentmolysis:sample_typeSputum    3.987e-01  1.529e-01  5.672e+01   2.607
## treatmentqiaamp:sample_typeSputum     3.638e-01  1.529e-01  5.672e+01   2.378
##                                      Pr(>|t|)   
## (Intercept)                           1.00000   
## treatmentlypma                        0.00605 **
## treatmentbenzonase                    0.35332   
## treatmenthost_zero                    0.01972 * 
## treatmentmolysis                      0.07904 . 
## treatmentqiaamp                       0.03187 * 
## sample_typeNasal                      1.00000   
## sample_typeSputum                     1.00000   
## treatmentlypma:sample_typeNasal       0.32697   
## treatmentbenzonase:sample_typeNasal   0.81223   
## treatmenthost_zero:sample_typeNasal   0.10498   
## treatmentmolysis:sample_typeNasal     0.93565   
## treatmentqiaamp:sample_typeNasal      0.50682   
## treatmentlypma:sample_typeSputum      0.88538   
## treatmentbenzonase:sample_typeSputum  0.00919 **
## treatmenthost_zero:sample_typeSputum  0.03102 * 
## treatmentmolysis:sample_typeSputum    0.01166 * 
## treatmentqiaamp:sample_typeSputum     0.02078 * 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Justifying stratified analysis - ANOVA(LM(dist ~ sample type * treatment))

lmer(dist ~ treatment * sample_type + (1|subject_id),
     data = horn_dist_long_within_sampleid_from_control %>%
             subset(sample_type != "Mock")) %>%
        anova() 

Stratified analysis - BAL

mh_lmer_bal <- lmer(dist ~ treatment + (1|subject_id),
     data = horn_dist_long_within_sampleid_from_control %>%
             subset(sample_type == "BAL")) 

mh_lmer_bal %>% 
        summary
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: dist ~ treatment + (1 | subject_id)
##    Data: horn_dist_long_within_sampleid_from_control %>% subset(sample_type ==  
##     "BAL")
## 
## REML criterion at convergence: 3.8
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -1.8036 -0.4949 -0.1508  0.5054  2.2725 
## 
## Random effects:
##  Groups     Name        Variance Std.Dev.
##  subject_id (Intercept) 0.01796  0.1340  
##  Residual               0.03615  0.1901  
## Number of obs: 24, groups:  subject_id, 4
## 
## Fixed effects:
##                     Estimate Std. Error        df t value Pr(>|t|)  
## (Intercept)        2.261e-16  1.163e-01 1.161e+01   0.000   1.0000  
## treatmentlypma     3.252e-01  1.344e-01 1.500e+01   2.419   0.0288 *
## treatmentbenzonase 1.067e-01  1.344e-01 1.500e+01   0.794   0.4398  
## treatmenthost_zero 2.736e-01  1.344e-01 1.500e+01   2.035   0.0599 .
## treatmentmolysis   2.039e-01  1.344e-01 1.500e+01   1.517   0.1502  
## treatmentqiaamp    2.508e-01  1.344e-01 1.500e+01   1.866   0.0818 .
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr) trtmntl trtmntb trtmn_ trtmntm
## tretmntlypm -0.578                               
## trtmntbnzns -0.578  0.500                        
## trtmnthst_z -0.578  0.500   0.500                
## trtmntmlyss -0.578  0.500   0.500   0.500        
## treatmntqmp -0.578  0.500   0.500   0.500  0.500

Stratified analysis - Nasal

mh_lmer_ns <- lmer(dist ~ treatment + (1|subject_id),
     data = horn_dist_long_within_sampleid_from_control %>%
             subset(sample_type == "Nasal")) 

mh_lmer_ns %>% 
        summary
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: dist ~ treatment + (1 | subject_id)
##    Data: horn_dist_long_within_sampleid_from_control %>% subset(sample_type ==  
##     "Nasal")
## 
## REML criterion at convergence: -28.4
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -1.4167 -0.6315 -0.1315  0.2369  2.1381 
## 
## Random effects:
##  Groups     Name        Variance Std.Dev.
##  subject_id (Intercept) 0.002114 0.04598 
##  Residual               0.013564 0.11646 
## Number of obs: 35, groups:  subject_id, 10
## 
## Fixed effects:
##                     Estimate Std. Error        df t value Pr(>|t|)   
## (Intercept)        7.015e-17  3.960e-02 2.775e+01   0.000  1.00000   
## treatmentlypma     1.751e-01  6.506e-02 2.378e+01   2.692  0.01279 * 
## treatmentbenzonase 1.600e-01  6.510e-02 2.400e+01   2.457  0.02160 * 
## treatmenthost_zero 4.886e-02  6.510e-02 2.400e+01   0.750  0.46027   
## treatmentmolysis   2.203e-01  6.506e-02 2.378e+01   3.386  0.00246 **
## treatmentqiaamp    1.342e-01  6.510e-02 2.400e+01   2.062  0.05018 . 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr) trtmntl trtmntb trtmn_ trtmntm
## tretmntlypm -0.527                               
## trtmntbnzns -0.526  0.295                        
## trtmnthst_z -0.526  0.295   0.360                
## trtmntmlyss -0.527  0.282   0.346   0.346        
## treatmntqmp -0.526  0.346   0.280   0.280  0.295

Stratified analysis - Sputum

mh_lmer_spt <- lmer(dist ~ treatment + (1|subject_id),
     data = horn_dist_long_within_sampleid_from_control %>%
             subset(sample_type == "Sputum")) 

mh_lmer_spt %>% 
        summary
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: dist ~ treatment + (1 | subject_id)
##    Data: horn_dist_long_within_sampleid_from_control %>% subset(sample_type ==  
##     "Sputum")
## 
## REML criterion at convergence: 5.6
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -1.8788 -0.5690  0.0840  0.4564  1.8823 
## 
## Random effects:
##  Groups     Name        Variance Std.Dev.
##  subject_id (Intercept) 0.06615  0.2572  
##  Residual               0.03208  0.1791  
## Number of obs: 30, groups:  subject_id, 5
## 
## Fixed effects:
##                     Estimate Std. Error        df t value Pr(>|t|)    
## (Intercept)        1.186e-15  1.402e-01 7.345e+00   0.000  1.00000    
## treatmentlypma     3.473e-01  1.133e-01 2.000e+01   3.066  0.00610 ** 
## treatmentbenzonase 5.193e-01  1.133e-01 2.000e+01   4.584  0.00018 ***
## treatmenthost_zero 6.119e-01  1.133e-01 2.000e+01   5.402 2.75e-05 ***
## treatmentmolysis   6.026e-01  1.133e-01 2.000e+01   5.320 3.31e-05 ***
## treatmentqiaamp    6.146e-01  1.133e-01 2.000e+01   5.426 2.60e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr) trtmntl trtmntb trtmn_ trtmntm
## tretmntlypm -0.404                               
## trtmntbnzns -0.404  0.500                        
## trtmnthst_z -0.404  0.500   0.500                
## trtmntmlyss -0.404  0.500   0.500   0.500        
## treatmntqmp -0.404  0.500   0.500   0.500  0.500

Viral beta diversity (LMER)

#Making subset of non-zero samples without neg

phyloseq_rel_nz_v <- v_phyloseq$viral_rel %>% 
        subset_samples(S.obs != 0 & sample_type %in% c("Mock", 
                                                       "BAL", "Nasal", "Sputum"))

#distances of betadiversity - boxplots
horn_dist_long <- distance(phyloseq_rel_nz_v, method="horn") %>% as.matrix() %>% melt_dist() #making long data of distance matrices

#Adding sample type and treatment name. 
#this can be also done by merging metadata into the `horn_dist_long`
names <- data.frame(str_split_fixed(horn_dist_long$iso1, "_", 3))
names2 <- data.frame(str_split_fixed(horn_dist_long$iso2, "_", 3))
horn_dist_long$sample_id_1 <- paste(names$X1, names$X2, sep = "_")
horn_dist_long$method_1 <- ifelse(grepl("lyPMA", horn_dist_long$iso1),"lypma", 
                                         ifelse(grepl("ben", horn_dist_long$iso1),"benzonase", 
                                                ifelse(grepl("host", horn_dist_long$iso1),"host_zero", 
                                                       ifelse(grepl("qia", horn_dist_long$iso1),"qiaamp", 
                                                              ifelse(grepl("moly", horn_dist_long$iso1),"molysis", 
                                                                     "control")))))


#Adding data for iso 2 also should be done
horn_dist_long$sample_id_2 <- paste(names2$X1, names2$X2, sep = "_")
horn_dist_long$method_2 <-ifelse(grepl("lyPMA", horn_dist_long$iso2),"lypma", 
                                        ifelse(grepl("ben", horn_dist_long$iso2),"benzonase", 
                                               ifelse(grepl("host", horn_dist_long$iso2),"host_zero", 
                                                      ifelse(grepl("qia", horn_dist_long$iso2),"qiaamp", 
                                                             ifelse(grepl("moly", horn_dist_long$iso2),"molysis", 
                                                                    "control")))))


#subsetting distances of my interest
horn_dist_long$sample_id_1 <- ifelse(grepl("pos", horn_dist_long$sample_id_1, ignore.case = T),"Mock", 
                                 ifelse(grepl("neg|n_", horn_dist_long$sample_id_1, ignore.case = T),"Neg.",
                                        horn_dist_long$sample_id_1))
horn_dist_long$sample_id_2 <- ifelse(grepl("pos", horn_dist_long$sample_id_2, ignore.case = T),"Mock", 
                                 ifelse(grepl("neg|n_", horn_dist_long$sample_id_2, ignore.case = T),"Neg.",
                                        horn_dist_long$sample_id_2))


horn_dist_long_within_sampleid_from_control <- subset(horn_dist_long, horn_dist_long$sample_id_1 == horn_dist_long$sample_id_2) # data within samples

horn_dist_long_within_sampleid_from_control <- subset(horn_dist_long_within_sampleid_from_control,
                                                           horn_dist_long_within_sampleid_from_control$method_1 != horn_dist_long_within_sampleid_from_control$method_2) # remove irrelevant association

horn_dist_long_within_sampleid_from_control <- subset(horn_dist_long_within_sampleid_from_control, (horn_dist_long_within_sampleid_from_control$method_1 == "control") + (horn_dist_long_within_sampleid_from_control$method_2 == "control") != 0)


horn_dist_long_within_sampleid_from_control$treatment <- horn_dist_long_within_sampleid_from_control$method_1

horn_dist_long_within_sampleid_from_control$treatment <- ifelse(horn_dist_long_within_sampleid_from_control$treatment == "control", horn_dist_long_within_sampleid_from_control$method_2, horn_dist_long_within_sampleid_from_control$treatment) 


#Setting key method
horn_dist_long_within_sampleid_from_control$sample_type <- ifelse(grepl("NS", horn_dist_long_within_sampleid_from_control$iso1), "Nasal",
                                                                  ifelse(grepl("CFB", horn_dist_long_within_sampleid_from_control$iso1), "Sputum",
                                                                         ifelse(grepl("BAL", horn_dist_long_within_sampleid_from_control$iso1), "BAL",
                                                                                ifelse(grepl("pos|POS", horn_dist_long_within_sampleid_from_control$iso1, ignore.case = T), "Mock",
                                                                                       ifelse(grepl("neg|N_EXT", horn_dist_long_within_sampleid_from_control$iso1), "Neg.",NA)))))

#Making a column for baseline (controls, from where?)
horn_dist_long_within_sampleid_from_control <- horn_dist_long_within_sampleid_from_control %>% 
        mutate(dist_from = case_when(method_1 == "control" ~ iso1,
                                     method_2 == "control" ~ iso2))

dummy <- data.frame(iso1 = horn_dist_long_within_sampleid_from_control$dist_from %>% unique,
           iso2 = horn_dist_long_within_sampleid_from_control$dist_from %>% unique,
           dist = 0,
           treatment = "Untreated",
           method_1 = "control",
           method_2 = "control"
           )
names <- data.frame(str_split_fixed(dummy$iso1, "_", 3))
names2 <- data.frame(str_split_fixed(dummy$iso2, "_", 3))
dummy$sample_id_1 <- paste(names$X1, names$X2, sep = "_")
#Adding data for iso 2 also should be done
dummy$sample_id_2 <- paste(names2$X1, names2$X2, sep = "_")


#subsetting distances of my interest
dummy$sample_id_1 <- ifelse(grepl("pos", dummy$sample_id_1, ignore.case = T),"Mock", 
                                 ifelse(grepl("neg|n_", dummy$sample_id_1, ignore.case = T),"Neg.",
                                        dummy$sample_id_1))
dummy$sample_id_2 <- ifelse(grepl("pos", dummy$sample_id_2, ignore.case = T),"Mock", 
                                 ifelse(grepl("neg|n_", dummy$sample_id_2, ignore.case = T),"Neg.",
                                        dummy$sample_id_2))
dummy$sample_type <- ifelse(grepl("NS", dummy$iso1), "Nasal",
                            ifelse(grepl("CFB", dummy$iso1), "Sputum",
                                   ifelse(grepl("BAL", dummy$iso1), "BAL",
                                          ifelse(grepl("pos|POS", dummy$iso1, ignore.case = T), "Mock",
                                                 ifelse(grepl("neg|N_EXT", dummy$iso1), "Neg.",NA)))))
dummy <- subset(dummy, !is.na(dummy$sample_type))
horn_dist_long_within_sampleid_from_control <- bind_rows(horn_dist_long_within_sampleid_from_control, dummy)

#Here, sample id is the same as subject id.
horn_dist_long_within_sampleid_from_control$subject_id <- horn_dist_long_within_sampleid_from_control$sample_id_1

horn_dist_long_within_sampleid_from_control$treatment <-
        factor(horn_dist_long_within_sampleid_from_control$treatment,
               levels = c("Untreated", "lypma", "benzonase", "host_zero", "molysis", "qiaamp"))


#mh_lmer_bal_v <- lmer(dist ~ treatment + (1|subject_id),
#     data = horn_dist_long_within_sampleid_from_control %>%
#             subset(sample_type == "BAL")) 

mh_lmer_ns_v <- lmer(dist ~ treatment + (1|subject_id),
     data = horn_dist_long_within_sampleid_from_control %>%
             subset(sample_type == "Nasal")) 

mh_lmer_spt_v <- lmer(dist ~ treatment + (1|subject_id),
     data = horn_dist_long_within_sampleid_from_control %>%
             subset(sample_type == "Sputum")) 

Tidy, summarized-stratified analysis

mh_lmer_bal_kbl <-  mh_lmer_bal %>%
        summary() %>%
        .$coefficients %>%
        data.frame(check.names = F) %>% 
        mutate(` ` = case_when(abs(`Pr(>|t|)`) < 0.001 ~ "***",
                               abs(`Pr(>|t|)`) < 0.01 ~ "**",
                               abs(`Pr(>|t|)`) < 0.05 ~ "*",
                               .default = " ")) %>% 
        rename("<i>p</i>-value" = "Pr(>|t|)",
               t_value = "t value") %>%
        merge(., 
              mh_lmer_bal %>% confint() %>%
                        round(digits = 1) %>% 
                        format(nsmall = 1) %>%
                        as.data.frame() %>%
                        mutate(conf = paste("(",
                                            gsub(" ","", .[,1]),
                                            ", ",
                                            gsub(" ","", .[,2]),
                                            ")",
                                            sep = "")),
              by = 0
              ) %>%
        column_to_rownames("Row.names") %>%
        rownames_to_column("treatment") %>%
        mutate(treatment = case_when(treatment == "(Intercept)" ~ "Untreated",
                                             treatment == "treatmentlypma" ~ "lyPMA",
                                             treatment == "treatmentbenzonase" ~ "Benzonase",
                                             treatment == "treatmenthost_zero" ~ "HostZERO",
                                             treatment == "treatmentmolysis" ~ "MolYsis",
                                             treatment == "treatmentqiaamp" ~ "QIAamp"
                                             )) %>%
        column_to_rownames("treatment") %>%
        rownames_to_column(var = "x") %>% mutate(x = gsub("treatment|sample_type", "", x)) %>% mutate(x = gsub(":", " * ", x)) %>%
        mutate(x = gsub("bal_log_centered_final_reads", "log<sub>10</sub>(Final reads)", x)) %>%
        column_to_rownames(var = "x") %>% 
        mutate("Effect size (95% CI)" = 
                       paste(round(Estimate, digits = 1) %>%
                                     format(nsmall = 1),
                             conf, 
                             sep = " "),
               "<i>p</i>-value" = round(`<i>p</i>-value`, 3)) %>%
        dplyr::select(c("Effect size (95% CI)", "<i>p</i>-value", " ")) %>%
        .[c("Untreated",
                  "lyPMA",
                  "Benzonase",
                  "HostZERO",
                  "MolYsis",
                  "QIAamp"),]


mh_lmer_ns_kbl <-  mh_lmer_ns %>%
        summary() %>%
        .$coefficients %>%
        data.frame(check.names = F) %>% 
        mutate(` ` = case_when(abs(`Pr(>|t|)`) < 0.001 ~ "***",
                               abs(`Pr(>|t|)`) < 0.01 ~ "**",
                               abs(`Pr(>|t|)`) < 0.05 ~ "*",
                               .default = " ")) %>% 
        rename("<i>p</i>-value" = "Pr(>|t|)",
               t_value = "t value") %>%
        merge(., 
              mh_lmer_ns %>% confint() %>%
                        round(digits = 1) %>% 
                        format(nsmall = 1) %>%
                        as.data.frame() %>%
                        mutate(conf = paste("(",
                                            gsub(" ","", .[,1]),
                                            ", ",
                                            gsub(" ","", .[,2]),
                                            ")",
                                            sep = "")),
              by = 0
              ) %>%
                column_to_rownames("Row.names") %>%
        rownames_to_column("treatment") %>%
        mutate(treatment = case_when(treatment == "(Intercept)" ~ "Untreated",
                                             treatment == "treatmentlypma" ~ "lyPMA",
                                             treatment == "treatmentbenzonase" ~ "Benzonase",
                                             treatment == "treatmenthost_zero" ~ "HostZERO",
                                             treatment == "treatmentmolysis" ~ "MolYsis",
                                             treatment == "treatmentqiaamp" ~ "QIAamp"
                                             )) %>%
        column_to_rownames("treatment")  %>% 
        rownames_to_column(var = "x") %>% mutate(x = gsub("treatment|sample_type", "", x)) %>% mutate(x = gsub(":", " * ", x)) %>%
        mutate(x = gsub("bal_log_centered_final_reads", "log<sub>10</sub>(Final reads)", x)) %>%
        column_to_rownames(var = "x") %>% 
        mutate("Effect size (95% CI)" = 
                       paste(round(Estimate, digits = 1) %>%
                                     format(nsmall = 1),
                             conf, 
                             sep = " "),
               "<i>p</i>-value" = round(`<i>p</i>-value`, 3)) %>%
        dplyr::select(c("Effect size (95% CI)", "<i>p</i>-value", " ")) %>%
        .[c("Untreated",
                  "lyPMA",
                  "Benzonase",
                  "HostZERO",
                  "MolYsis",
                  "QIAamp"),]


mh_lmer_spt_kbl <-  mh_lmer_spt %>%
        summary() %>%
        .$coefficients %>%
        data.frame(check.names = F) %>% 
        mutate(` ` = case_when(abs(`Pr(>|t|)`) < 0.001 ~ "***",
                               abs(`Pr(>|t|)`) < 0.01 ~ "**",
                               abs(`Pr(>|t|)`) < 0.05 ~ "*",
                               .default = " ")) %>% 
        rename("<i>p</i>-value" = "Pr(>|t|)",
               t_value = "t value") %>%
        merge(., 
              mh_lmer_spt %>% confint() %>%
                        round(digits = 1) %>% 
                        format(nsmall = 1) %>%
                        as.data.frame() %>%
                        mutate(conf = paste("(",
                                            gsub(" ","", .[,1]),
                                            ", ",
                                            gsub(" ","", .[,2]),
                                            ")",
                                            sep = "")),
              by = 0
              ) %>%
                column_to_rownames("Row.names") %>%
        rownames_to_column("treatment") %>%
        mutate(treatment = case_when(treatment == "(Intercept)" ~ "Untreated",
                                             treatment == "treatmentlypma" ~ "lyPMA",
                                             treatment == "treatmentbenzonase" ~ "Benzonase",
                                             treatment == "treatmenthost_zero" ~ "HostZERO",
                                             treatment == "treatmentmolysis" ~ "MolYsis",
                                             treatment == "treatmentqiaamp" ~ "QIAamp"
                                             )) %>%
        column_to_rownames("treatment") %>% 
        rownames_to_column(var = "x") %>% mutate(x = gsub("treatment|sample_type", "", x)) %>% mutate(x = gsub(":", " * ", x)) %>%
        mutate(x = gsub("bal_log_centered_final_reads", "log<sub>10</sub>(Final reads)", x)) %>%
        column_to_rownames(var = "x") %>% 
        mutate("Effect size (95% CI)" = 
                       paste(round(Estimate, digits = 1) %>%
                                     format(nsmall = 1),
                             conf, 
                             sep = " "),
               "<i>p</i>-value" = round(`<i>p</i>-value`, 3)) %>%
        dplyr::select(c("Effect size (95% CI)", "<i>p</i>-value", " ")) %>%
        .[c("Untreated",
                  "lyPMA",
                  "Benzonase",
                  "HostZERO",
                  "MolYsis",
                  "QIAamp"),]



mh_lmer_bal_kbl_v <-  mh_lmer_bal %>%
        summary() %>%
        .$coefficients %>%
        data.frame(check.names = F) %>% 
        mutate(` ` = case_when(abs(`Pr(>|t|)`) < 0.001 ~ "***",
                               abs(`Pr(>|t|)`) < 0.01 ~ "**",
                               abs(`Pr(>|t|)`) < 0.05 ~ "*",
                               .default = " ")) %>% 
        rename("<i>p</i>-value" = "Pr(>|t|)",
               t_value = "t value") %>%
        merge(., 
              mh_lmer_bal %>% confint() %>%
                        round(digits = 1) %>% 
                        format(nsmall = 1) %>%
                        as.data.frame() %>%
                        mutate(conf = paste("(",
                                            gsub(" ","", .[,1]),
                                            ", ",
                                            gsub(" ","", .[,2]),
                                            ")",
                                            sep = "")),
              by = 0
              ) %>%
        column_to_rownames("Row.names") %>%
        rownames_to_column("treatment") %>%
        mutate(treatment = case_when(treatment == "(Intercept)" ~ "Untreated",
                                             treatment == "treatmentlypma" ~ "lyPMA",
                                             treatment == "treatmentbenzonase" ~ "Benzonase",
                                             treatment == "treatmenthost_zero" ~ "HostZERO",
                                             treatment == "treatmentmolysis" ~ "MolYsis",
                                             treatment == "treatmentqiaamp" ~ "QIAamp"
                                             )) %>%
        column_to_rownames("treatment") %>%
        rownames_to_column(var = "x") %>% mutate(x = gsub("treatment|sample_type", "", x)) %>% mutate(x = gsub(":", " * ", x)) %>%
        mutate(x = gsub("bal_log_centered_final_reads", "log<sub>10</sub>(Final reads)", x)) %>%
        column_to_rownames(var = "x") %>% 
        mutate("Effect size (95% CI)" = "-",
               "<i>p</i>-value" = "-",
               " " = "") %>%
        dplyr::select(c("Effect size (95% CI)", "<i>p</i>-value", " ")) %>%
        .[c("Untreated",
                  "lyPMA",
                  "Benzonase",
                  "HostZERO",
                  "MolYsis",
                  "QIAamp"),]


mh_lmer_ns_kbl_v <-  mh_lmer_ns_v %>%
        summary() %>%
        .$coefficients %>%
        data.frame(check.names = F) %>% 
        mutate(` ` = case_when(abs(`Pr(>|t|)`) < 0.001 ~ "***",
                               abs(`Pr(>|t|)`) < 0.01 ~ "**",
                               abs(`Pr(>|t|)`) < 0.05 ~ "*",
                               .default = " ")) %>% 
        rename("<i>p</i>-value" = "Pr(>|t|)",
               t_value = "t value") %>%
        merge(., 
              mh_lmer_ns %>% confint() %>%
                        round(digits = 1) %>% 
                        format(nsmall = 1) %>%
                        as.data.frame() %>%
                        mutate(conf = paste("(",
                                            gsub(" ","", .[,1]),
                                            ", ",
                                            gsub(" ","", .[,2]),
                                            ")",
                                            sep = "")),
              by = 0
              ) %>%
                column_to_rownames("Row.names") %>%
        rownames_to_column("treatment") %>%
        mutate(treatment = case_when(treatment == "(Intercept)" ~ "Untreated",
                                             treatment == "treatmentlypma" ~ "lyPMA",
                                             treatment == "treatmentbenzonase" ~ "Benzonase",
                                             treatment == "treatmenthost_zero" ~ "HostZERO",
                                             treatment == "treatmentmolysis" ~ "MolYsis",
                                             treatment == "treatmentqiaamp" ~ "QIAamp"
                                             )) %>%
        column_to_rownames("treatment")  %>% 
        rownames_to_column(var = "x") %>% mutate(x = gsub("treatment|sample_type", "", x)) %>% mutate(x = gsub(":", " * ", x)) %>%
        mutate(x = gsub("bal_log_centered_final_reads", "log<sub>10</sub>(Final reads)", x)) %>%
        column_to_rownames(var = "x") %>% 
        mutate("Effect size (95% CI)" = 
                       paste(round(Estimate, digits = 1) %>%
                                     format(nsmall = 1),
                             conf, 
                             sep = " "),
               "<i>p</i>-value" = round(`<i>p</i>-value`, 3)) %>%
        dplyr::select(c("Effect size (95% CI)", "<i>p</i>-value", " ")) %>%
        .[c("Untreated",
                  "lyPMA",
                  "Benzonase",
                  "HostZERO",
                  "MolYsis",
                  "QIAamp"),]


mh_lmer_spt_kbl_v <-  mh_lmer_spt_v %>%
        summary() %>%
        .$coefficients %>%
        data.frame(check.names = F) %>% 
        mutate(` ` = case_when(abs(`Pr(>|t|)`) < 0.001 ~ "***",
                               abs(`Pr(>|t|)`) < 0.01 ~ "**",
                               abs(`Pr(>|t|)`) < 0.05 ~ "*",
                               .default = " ")) %>% 
        rename("<i>p</i>-value" = "Pr(>|t|)",
               t_value = "t value") %>%
        merge(., 
              mh_lmer_spt %>% confint() %>%
                        round(digits = 1) %>% 
                        format(nsmall = 1) %>%
                        as.data.frame() %>%
                        mutate(conf = paste("(",
                                            gsub(" ","", .[,1]),
                                            ", ",
                                            gsub(" ","", .[,2]),
                                            ")",
                                            sep = "")),
              by = 0
              ) %>%
                column_to_rownames("Row.names") %>%
        rownames_to_column("treatment") %>%
        mutate(treatment = case_when(treatment == "(Intercept)" ~ "Untreated",
                                             treatment == "treatmentlypma" ~ "lyPMA",
                                             treatment == "treatmentbenzonase" ~ "Benzonase",
                                             treatment == "treatmenthost_zero" ~ "HostZERO",
                                             treatment == "treatmentmolysis" ~ "MolYsis",
                                             treatment == "treatmentqiaamp" ~ "QIAamp"
                                             )) %>%
        column_to_rownames("treatment") %>% 
        rownames_to_column(var = "x") %>% mutate(x = gsub("treatment|sample_type", "", x)) %>% mutate(x = gsub(":", " * ", x)) %>%
        mutate(x = gsub("bal_log_centered_final_reads", "log<sub>10</sub>(Final reads)", x)) %>%
        column_to_rownames(var = "x") %>% 
        mutate("Effect size (95% CI)" = 
                       paste(round(Estimate, digits = 1) %>%
                                     format(nsmall = 1),
                             conf, 
                             sep = " "),
               "<i>p</i>-value" = round(`<i>p</i>-value`, 3)) %>%
        dplyr::select(c("Effect size (95% CI)", "<i>p</i>-value", " ")) %>%
        .[c("Untreated",
                  "lyPMA",
                  "Benzonase",
                  "HostZERO",
                  "MolYsis",
                  "QIAamp"),]



tableS7 <- 
        rbind(
                cbind(Outcome = "Microbial beta diversity",
                      mh_lmer_bal_kbl %>% rownames_to_column("Treatment"), 
                      mh_lmer_ns_kbl, 
                      mh_lmer_spt_kbl) %>% remove_rownames() %>% as.matrix(),
                cbind(Outcome = "Viral beta diversity",
                      mh_lmer_bal_kbl_v %>% rownames_to_column("Treatment"), 
                      mh_lmer_ns_kbl_v, 
                      mh_lmer_spt_kbl_v) %>% remove_rownames() %>% as.matrix()
                ) %>%
        
        kbl(format = "html", escape = 0) %>% 
        add_header_above(c(" " = 2, "BAL" = 3, "Nasal swab" = 3,"Sputum"= 3)) %>% 
        #add_rownames(c("Microbial species" = 6, "Viral species"= 6)) %>% 
        kable_styling(full_width = 0, html_font = "sans")


save_kable(tableS7, file = "Project_SICAS2_microbiome/7_Manuscripts/2022_MGK_Host_Depletion/Figures/tableS7.html", self_contained = T)

tableS7
BAL
Nasal swab
Sputum
Outcome Treatment Effect size (95% CI) p-value Effect size (95% CI) p-value Effect size (95% CI) p-value
Microbial beta diversity Untreated 0.0 (-0.2, 0.2) 1.000 0.0 (-0.1, 0.1) 1.000 0.0 (-0.3, 0.3) 1.000
Microbial beta diversity lyPMA 0.3 (0.1, 0.6) 0.029
0.2 (0.1, 0.3) 0.013
0.3 (0.1, 0.6) 0.006 **
Microbial beta diversity Benzonase 0.1 (-0.1, 0.3) 0.440 0.2 (0.0, 0.3) 0.022
0.5 (0.3, 0.7) 0.000 ***
Microbial beta diversity HostZERO 0.3 (0.0, 0.5) 0.060 0.0 (-0.1, 0.2) 0.460 0.6 (0.4, 0.8) 0.000 ***
Microbial beta diversity MolYsis 0.2 (0.0, 0.4) 0.150 0.2 (0.1, 0.3) 0.002 ** 0.6 (0.4, 0.8) 0.000 ***
Microbial beta diversity QIAamp 0.3 (0.0, 0.5) 0.082 0.1 (0.0, 0.3) 0.050 0.6 (0.4, 0.8) 0.000 ***
Viral beta diversity Untreated
0.0 (-0.1, 0.1) 1.000 0.0 (-0.3, 0.3) 1.000
Viral beta diversity lyPMA
0.5 (0.1, 0.3) 0.000 *** 0.2 (0.1, 0.6) 0.232
Viral beta diversity Benzonase
0.1 (0.0, 0.3) 0.215 0.3 (0.3, 0.7) 0.093
Viral beta diversity HostZERO
0.1 (-0.1, 0.2) 0.339 0.7 (0.4, 0.8) 0.001 **
Viral beta diversity MolYsis
0.2 (0.1, 0.3) 0.076 0.8 (0.4, 0.8) 0.001 ***
Viral beta diversity QIAamp
0.4 (0.0, 0.3) 0.002 ** 0.7 (0.4, 0.8) 0.002 **

Beta diversity plot on all samples

fig_sample_type <- ordinate(phyloseq_rel_nz_neg, method = "PCoA", distance = "horn") %>%
        plot_ordination(phyloseq_rel_nz_neg, ., col = "sample_type") +
        #scale_color_viridis(discrete = 6, name = "Treatment", labels = c("Mock theoretical", "Control","lyPMA", "Benzonase", "HostZERO", "MolYsis", "QIAamp")) +
        scale_color_manual(values = c("#a8ddb5", "#43a2ca", "#a50f15", "#fc9272", "#fee0d2"),
                           name = "Sample type",
                           breaks = c("Neg.","Mock", "BAL", "Nasal", "Sputum"),
                           labels = c("Neg.","Mock", "BAL", "Nasal", "Sputum")) + #color using https://colorbrewer2.org/#type=qualitative&scheme=Set1&n=6
        #scale_shape(name = "Sample type", labels = c("Mock theoretical", "Mock")) +
        geom_point(size = 3) +
        theme_classic (base_size = 12, base_family = "sans") +
        #facet_wrap(~sample_type, scales = "free") +
        #labs(tag = "A") +
        ggtitle("PCoA plot, M-H dissimilarity, on all samples") +
        theme(plot.tag = element_text(size = 15), legend.position = "top") #+
        #stat_ellipse(type = "norm", linetype = 2, linewidth = 0.1) +
        #stat_ellipse(type = "t", linewidth = 0.1)

fig_sample_type

Beta diversity plot of decontaminated samples

fig_sample_type_decontam <- ordinate(phyloseq_rel_nz_neg_decontam, method = "PCoA", distance = "horn") %>%
        plot_ordination(phyloseq_rel_nz_neg_decontam, ., col = "sample_type") +
        #scale_color_viridis(discrete = 6, name = "Treatment", labels = c("Mock theoretical", "Control","lyPMA", "Benzonase", "HostZERO", "MolYsis", "QIAamp")) +
        scale_color_manual(values = c("#a8ddb5", "#43a2ca", "#a50f15", "#fc9272", "#fee0d2"),
                           name = "Sample type",
                           breaks = c("Neg.","Mock", "BAL", "Nasal", "Sputum"),
                           labels = c("Neg.","Mock", "BAL", "Nasal", "Sputum")) + #color using https://colorbrewer2.org/#type=qualitative&scheme=Set1&n=6
        #scale_shape(name = "Sample type", labels = c("Mock theoretical", "Mock")) +
        geom_point(size = 3) +
        theme_classic (base_size = 12, base_family = "sans") +
        #facet_wrap(~sample_type, scales = "free") +
        #labs(tag = "A") +
        ggtitle("PCoA plot, M-H dissimilarity, on all samples (decontaminated)") +
        theme(plot.tag = element_text(size = 15), legend.position = "top") #+
        #stat_ellipse(type = "norm", linetype = 2, linewidth = 0.1) +
        #stat_ellipse(type = "t", linewidth = 0.1)

fig_sample_type_decontam

Fig. S7 PCA figure

Fig. S7. Principal coordinate analysis plot based on Morisita-Horn dissimilarity of taxonomic sequencing results stratified by sample type.

figS7 <- ordinate(subset_samples(phyloseq_rel_nz, sample_type != "Neg." & sample_type != "Mock"), method = "PCoA", distance = "horn") %>%
        plot_ordination(phyloseq_rel_nz, ., col = "treatment") +
        #scale_color_viridis(discrete = 6, name = "Treatment", labels = c("Mock theoretical", "Control","lyPMA", "Benzonase", "HostZERO", "MolYsis", "QIAamp")) +
        scale_color_manual(values = c("#e31a1c", "#fb9a99", "#33a02c", "#b2df8a", "#1f78b4", "#a6cee3"),
                           name = "Treatment",
                           breaks = c("Untreated","lyPMA", "Benzonase", "HostZERO", "MolYsis", "QIAamp"),
                           labels = c("Untreated", "lyPMA", "Benzonase", "HostZERO", "MolYsis", "QIAamp")) + #color using https://colorbrewer2.org/#type=qualitative&scheme=Set1&n=6
        #scale_shape(name = "Sample type", labels = c("Mock theoretical", "Mock")) +
        geom_point(size = 3) +
        theme_classic (base_size = 12, base_family = "sans") +
        facet_wrap(~sample_type, scales = "free") +
        theme(plot.tag = element_text(size = 15), legend.position = "top")# +
        #stat_ellipse(type = "norm") +
        #stat_ellipse(type = "t")

figS7

png(file = "Project_SICAS2_microbiome/7_Manuscripts/2022_MGK_Host_Depletion/Figures/FigureS7.png",   # The directory you want to save the file in
    width = 180, # The width of the plot in inches
    height = 90, # The height of the plot in inches
    units = "mm",
    res = 600
) #fixing multiple page issue
figS7
dev.off()
## quartz_off_screen 
##                 2

Fig. S6. Mock community summary

Meanwhile, mock community controls are showing higher species richness than its original design.

Plot of taxa that they shouldn’t be found at Mock samples

#Manipulating phyloseq - only top 10 
phyloseq_control_rel <- subset_samples(phyloseq_rel_nz, sample_type == "Mock" | sample_type == "Neg.")
phyloseq_control_rel_contam <- subset_taxa(phyloseq_control_rel,
                                           !(taxa_names(phyloseq_control_rel) %in%
                                                       head(
                                                               taxa_sums(
                                                                       subset_samples(
                                                                               phyloseq_control_rel,
                                                                               sample_type == "Mock" &
                                                                                       S.obs != 0)) %>%
                                data.frame %>%
                                arrange(-.) %>%
                                row.names(), 10))
)

phyloseq_control_rel_contam <- subset_taxa(phyloseq_control_rel_contam, taxa_sums(phyloseq_control_rel_contam) != 0)
phyloseq_control_rel_contam <- subset_samples(phyloseq_control_rel_contam, sample_type != "Neg." & S.obs != 0)


tax_table(phyloseq_control_rel_contam) %>%
        cbind(species20 = "[Others]") %>%
        {top20species <- head(taxa_sums(phyloseq_control_rel_contam) %>%
                                data.frame %>%
                                arrange(-.) %>%
                                row.names(), 10)
   .[top20species, "species20"] <- as.character(.[top20species, "Species"])
   .[, 9] <- .[, 9] %>% gsub("s__", "", .) %>% gsub("_", " ", .) %>% paste("<i>", ., "</i>", sep = "")
   phyloseq_temp <- phyloseq_control_rel_contam
   tax_table(phyloseq_temp) <- tax_table(.) 
   phyloseq_temp
  } %>%
        plot_bar(., fill="species20") + 
        ylab("Relative abundancne") +
        theme_classic(base_size = 11, base_family = "serif") +
        ggtitle("Contaminants in Zymo mock (denominator is total microbes in samples)") +
        theme(legend.text = element_markdown()) +
        guides(fill=guide_legend(title="Top 10 species")) +
        facet_wrap (~ factor(treatment, levels = c("Untreated", "lyPMA", "Benzonase", "HostZERO", "MolYsis", "QIAamp")),
                    scales= "free_x", nrow=1)

#Manipulating phyloseq - only top 10 


tax_table(phyloseq_control_rel_contam) %>%
        cbind(species20 = "[Others]") %>%
        {top20species <- head(taxa_sums(phyloseq_control_rel_contam) %>%
                                data.frame %>%
                                arrange(-.) %>%
                                row.names(), 20)
   .[top20species, "species20"] <- as.character(.[top20species, "Species"])
   .[, 9] <- .[, 9] %>% gsub("s__", "", .) %>% gsub("_", " ", .) %>% paste("<i>", ., "</i>", sep = "")
   phyloseq_temp <- phyloseq_control_rel_contam
   tax_table(phyloseq_temp) <- tax_table(.) 
   phyloseq_temp %>%
           transform_sample_counts(., function(x){x/sum(x)})
  } %>%
        plot_bar(., fill="species20") + 
        ylab("Relative abundancne") +
        theme_classic(base_size = 11, base_family = "serif") +
        ggtitle("Contaminants in Zymo mock (denominator is total comtaminant)") +
        theme(legend.text = element_markdown()) +
        guides(fill=guide_legend(title="Top 20 species")) +
        facet_wrap (~ factor(treatment, levels = c("Untreated", "lyPMA", "Benzonase", "HostZERO", "MolYsis", "QIAamp")),
                    scales= "free_x", nrow=1)

The proportion of unexpected taxa was maintained throughout all the samples besides 1 lyPMA terated Mock sample. It lookws like that 1 sample had contaminant of Cupriavidus sp. and Cutibacterium. For the others, as their composition looks similar, it seems like they are taxa that assigned with wrong species name. Because,

  1. If they were contaminants after host depletion, all of them should appear in treated samples as well but only some abundant taxa were remained.

  2. If they were lab-contaminants before host depletion, their proportion should be different across samples but they are not.

  3. Most of taxa were in the same taxonomic clades with mock communities at genus level (Cryptococcus, Listeria, Saccharomyces, Staphylococcus).

  4. At least, the species richness decreased in the treated group. Therefore, it cannot be said that they were contaminants from lab wares, host depletion, extraction, sequencing, etc. If they were, it is possible that they should have been contained in the Mock community from the beginning.

Mock community filtering

phyloseq_control_rel_10zymo <- subset_samples(phyloseq_unfiltered$phyloseq_rel, sample_type == "Mock")
subset_samples(phyloseq_unfiltered$phyloseq_rel, sample_type == "Mock") %>% taxa_names %>% .[c(grep("lacto", .))]
## [1] "Anaerococcus_lactolyticus"
zymo_speceis <- c("s__Pseudomonas_aeruginosa",
                  "s__Escherichia_coli", 
                  "s__Salmonella_enterica",
                  "s__Lactobacillus_fermentum",
                  "s__Enterococcus_faecalis",
                  "s__Staphylococcus_aureus",
                  "s__Listeria_monocytogenes",
                  "s__Bacillus_subtilis",
                  "s__Saccharomyces_cerevisiae",
                  "s__Cryptococcus_neoformans")



phyloseq_control_rel_10zymo <- subset_taxa(phyloseq_control_rel_10zymo,
                                           Species %in% zymo_speceis)

sample_data(phyloseq_control_rel_10zymo)$S.obs <-
        rowSums(t(otu_table(phyloseq_control_rel_10zymo)) != 0)


f5S_mock_filtered <- ggplot(subset(sample_data(phyloseq_control_rel_10zymo) %>% 
                             data.frame,
                          sample_data(phyloseq_control_rel_10zymo)$sample_type %in%
                                  c("Mock")),
                   aes(x = treatment, y = S.obs)) +
        geom_jitter(aes(color = treatment), position = position_jitter(0.2),
                    size = 1.2, alpha = 0.3, stroke = 0) +
        stat_summary(aes(color = treatment),
                             fun.data="mean_sdl",  fun.args = list(mult=1), 
                             geom = "pointrange",  size = 0.4) +
        ylab("Species richness") +
        xlab("Treatment group") +
        theme_classic (base_size = 12, base_family = "sans") + 
        scale_color_manual(values = c("#e31a1c", "#fb9a99", "#33a02c", "#b2df8a", "#1f78b4", "#a6cee3"), name = "Treatment", labels = c("Untreated","lyPMA", "Benzonase", "HostZERO", "MolYsis", "QIAamp")) + #color using 
        labs(tag = "C") +
        theme(plot.tag = element_text(size = 15),
              axis.text.x = element_blank(),
              axis.ticks.x = element_blank(),
              legend.position = "top") +
        ylim(c(0, 10)) +
        facet_wrap(~sample_type, nrow = 1) + 
        guides(col = guide_legend(nrow = 1))


phyloseq_control_rel_10zymo <- transform_sample_counts(phyloseq_control_rel_10zymo,
                                                       function(x){x/sum(x)})

barplot_mock_microbe_zymo_filtered <- phyloseq_control_rel_10zymo %>%
        tax_table() %>%
        {
   .[, 7] <- gsub("s__", " ", .[, 7])
   .[, 7] <- gsub("_", " ", .[, 7])
   .[, 7] <- gsub("[]]|[[]", "",  .[, 7])
   .[, 7] <- gsub(" sp", " sp.",  .[, 7])
   .[, 7] <- gsub(" sp.", "</i> sp.",  .[, 7])
   .[, 7] <- gsub(" group", "</i> group.",  .[, 7])
   .[, 7] <- ifelse(grepl("Other",.[, 7]),
                    "Other",
                    ifelse(grepl("</i>",  .[, 7]),
                           paste("<i>",  .[, 7], sep = ""),
                           paste("<i>",  .[, 7], "</i>", sep = "")) %>%
                            gsub("s__", "", .) %>%
                            gsub("_", " ", .)
   )
   phyloseq_temp <- subset_samples(phyloseq_control_rel_10zymo,
                              sample_type == "Mock") %>% 
           subset_samples(., S.obs != 0)
   tax_table(phyloseq_temp) <- tax_table(.) 
   phyloseq_temp
  } %>%
        my_plot_bar(., fill="Species") + 
        xlab("Sample") +
        ylab("") +
        theme_classic(base_size = 11, base_family = "sans") +
        theme(legend.text = element_markdown(size = 6),
              legend.key.size = unit(3, "mm"),
              legend.title = element_text(size = 6),
              axis.text.x = element_text(color = "white")) +
        guides(fill=guide_legend(title="Top 10 species")) +
        scale_fill_manual(values = c(RColorBrewer::brewer.pal(n = 9, name = "Paired"))) +
        facet_wrap (~ treatment, scales = "free_x", nrow = 1) +
        labs(tag = "A")
        #facet_wrap (~ factor(sample_type, levels = c("Mock", "BAL", "Nasal", "Sputum")),
        #            scales= "free_x", nrow=2) +

LMER test on species richness of mock - zymo species only

phyloseq_control_rel_10zymo
## phyloseq-class experiment-level object
## otu_table()   OTU Table:         [ 9 taxa and 31 samples ]
## sample_data() Sample Data:       [ 31 samples by 70 sample variables ]
## tax_table()   Taxonomy Table:    [ 9 taxa by 8 taxonomic ranks ]
sr_lmer_mock_zymo10 <- lm(S.obs ~ treatment,
                    data = sample_data(phyloseq_control_rel_10zymo) %>% 
                            data.frame %>% 
                            subset(., .$sample_type %in% c("Mock") & S.obs != 0))

sr_lmer_mock_zymo10 %>% summary() 
## 
## Call:
## lm(formula = S.obs ~ treatment, data = sample_data(phyloseq_control_rel_10zymo) %>% 
##     data.frame %>% subset(., .$sample_type %in% c("Mock") & S.obs != 
##     0))
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
##   -1.6   -0.2    0.0    0.0    1.6 
## 
## Coefficients:
##                      Estimate Std. Error t value Pr(>|t|)    
## (Intercept)         9.000e+00  2.191e-01  41.079  < 2e-16 ***
## treatmentlyPMA     -4.000e-01  3.250e-01  -1.231     0.23    
## treatmentBenzonase -2.600e+00  3.250e-01  -8.001 2.34e-08 ***
## treatmentHostZERO  -2.800e+00  3.250e-01  -8.616 5.91e-09 ***
## treatmentMolYsis   -2.637e-15  3.250e-01   0.000     1.00    
## treatmentQIAamp    -2.642e-15  3.250e-01   0.000     1.00    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.5367 on 25 degrees of freedom
## Multiple R-squared:  0.8663, Adjusted R-squared:  0.8396 
## F-statistic: 32.41 on 5 and 25 DF,  p-value: 3.735e-10

Species richness of mock - zymo species only - figure

dat_text <- data.frame(
  label = c(
          "", "***", "***", "", ""#, #label for Mock
          #"", "", "", "*", "", #label for BAL
          #"", "", "***", "*", "**", 
          #"**", "***", "***", "***", "***"
          ),
  sample_type = c(
          "Mock", "Mock", "Mock", "Mock", "Mock"#, 
          #"BAL", "BAL", "BAL", "BAL", "BAL", 
          #"Nasal", "Nasal", "Nasal", "Nasal", "Nasal", 
          #"Sputum", "Sputum", "Sputum", "Sputum", "Sputum"
          ),
  treatment = c(
          "lyPMA", "Benzonase", "HostZERO", "MolYsis", "QIAamp"#, 
          #"lyPMA", "Benzonase", "HostZERO", "MolYsis", "QIAamp",
          #"lyPMA", "Benzonase", "HostZERO", "MolYsis", "QIAamp",
          #"lyPMA", "Benzonase", "HostZERO", "MolYsis", "QIAamp"
          ),
  S.obs = c(
          9, 8.5, 7.5, 9, 9
          #30, 35, 50, 52, 50,
          #30, 30, 30, 35, 30,
          #100, 120, 147, 140, 125
          )
)



dat_text$sample_type <- factor(dat_text$sample_type, levels = c("Mock"#, 
                                                                #"BAL", "Nasal", "Sputum"
                                                                ))
dat_text$treatment <- factor(dat_text$treatment, levels = c("Untreated", "lyPMA", "Benzonase", "HostZERO", "MolYsis", "QIAamp"))


f5S_mock_filtered <- f5S_mock_filtered + geom_text(
  data    = dat_text,
  mapping = aes(x = treatment, y = S.obs, label = label)
)
f5S_mock_filtered

Mock community figure

Alpha and beta diversity indices were calculated for mock community samples and merged to FigS5.

ggarrange(barplot_mock_microbe_zymo_filtered,
          barplot_mock_gramproportion,
          f5S_mock_filtered,
          f5S_mock_mh_box,
          align = "hv",
          heights = c(2,2,3,2),
          widths = c(1, 1, 0.8,1),
          ncol=1, nrow=4)

png(file = "Project_SICAS2_microbiome/7_Manuscripts/2022_MGK_Host_Depletion/Figures/FigureS5.png",   # The directory you want to save the file in
    width = 190, # The width of the plot in inches
    height = 200, # The height of the plot in inches
    units = "mm",
    res = 600
) #fixing multiple page issue

ggarrange(barplot_mock_microbe_zymo_filtered,
          barplot_mock_gramproportion,
          f5S_mock_filtered,
          f5S_mock_mh_box,
          align = "hv",
          heights = c(2,2,3,2),
          widths = c(1, 1, 0.8,1),
          ncol=1, nrow=4)



dev.off()
## quartz_off_screen 
##                 2

iv. Mediation analysis

mediation analysis

Table S4. Mediation analysis (treatment-stratified)

outcome = S.obs exposure = treatment (stratified) mediator = Final_reads mediator-outcome confounders = sample_type exposure-mediator confounders = NA outcome model = Mixed effects linear regression mediator model = Mixed effects linear regression

Mediation analysis was conducted stratified treatment.

Table S4. Mediation analysis results in estimated effect sizes and p-values of indirect effect, direct effect, and proportion of mediation. The analysis employed treatment as exposure, log10(final reads) as mediator, species richness as outcome, and sample type as mediator-exposure confounder. Analysis was stratified by each treatment and treated as binary variables.

## lypma
# only mediator-outcome confounders
detach_package <- function(pkg, character.only = FALSE)
{
  if(!character.only)
  {
    pkg <- deparse(substitute(pkg))
  }
  search_item <- paste("package", pkg, sep = ":")
  while(search_item %in% search())
  {
    detach(search_item, unload = TRUE, character.only = TRUE)
  }
}

detach_package(lmerTest)

## all treatment groups

sample_data_respiratory <- subset(phyloseq_rel_nz %>% sample_data %>% data.frame, sample_data$sample_type == "Sputum" | sample_data$sample_type == "BAL" | sample_data$sample_type == "Nasal")


med.fit <- lmer(log10(Final_reads) ~ lypma +  sample_type + (1|subject_id),
                data = subset(sample_data_respiratory, sample_data_respiratory$control == 1 | sample_data_respiratory$lypma == 1))

out.fit <- lmer(S.obs ~ lypma * log10(Final_reads) + sample_type + (1|subject_id), 
                data = subset(sample_data_respiratory, sample_data_respiratory$control == 1 | sample_data_respiratory$lypma == 1))
set.seed(seed)
med.out.lypma <- mediate(model.m = med.fit,
                   model.y = out.fit,
                     treat = "lypma",
                     mediator = "log10(Final_reads)",
                     sims = 10000)

out.sum <- summary(med.out.lypma)

final.out.lypma <- data.frame(`Indirect est.`=out.sum$d.avg, 
                          `Indirect p-value`=out.sum$d.avg.p,
                          `Direct est.`=out.sum$z.avg, 
                          `Direct p-value`=out.sum$z.avg.p, 
                          `Total est.`=out.sum$tau.coef, 
                          `Total p-value`=out.sum$tau.p,
                          `Proportion mediation est.`=out.sum$n.avg, 
                          `Proportion mediation p-value`=out.sum$n.avg.p,
                          check.names = F) %>% 
    mutate_all(~round(., 3))

## benzonase


med.fit <- lmer(log10(Final_reads) ~ benzonase + sample_type + (1|subject_id),
                data = subset(sample_data_respiratory, sample_data_respiratory$control == 1 | sample_data_respiratory$benzonase == 1))

out.fit <- lmer(S.obs ~ benzonase * log10(Final_reads) +  sample_type + (1|subject_id), 
                data = subset(sample_data_respiratory, sample_data_respiratory$control == 1 | sample_data_respiratory$benzonase == 1))

set.seed(seed)
med.out.benzonase <- mediate(model.m = med.fit,
                   model.y = out.fit,
                     treat = "benzonase",
                     mediator = "log10(Final_reads)",
                     sims = 10000)

out.sum <- summary(med.out.benzonase)

final.out.benzonase <- data.frame(`Indirect est.`=out.sum$d.avg, 
                          `Indirect p-value`=out.sum$d.avg.p,
                          `Direct est.`=out.sum$z.avg, 
                          `Direct p-value`=out.sum$z.avg.p, 
                          `Total est.`=out.sum$tau.coef, 
                          `Total p-value`=out.sum$tau.p,
                          `Proportion mediation est.`=out.sum$n.avg, 
                          `Proportion mediation p-value`=out.sum$n.avg.p,
                          check.names = F) %>% 
    mutate_all(~round(., 3))

## host zero


med.fit <- lmer(log10(Final_reads) ~ host_zero +  sample_type + (1|subject_id),
                data = subset(sample_data_respiratory, sample_data_respiratory$control == 1 | sample_data_respiratory$host_zero == 1))

out.fit <- lmer(S.obs ~ host_zero * log10(Final_reads) +  sample_type + (1|subject_id), 
                data = subset(sample_data_respiratory, sample_data_respiratory$control == 1 | sample_data_respiratory$host_zero == 1))
set.seed(seed)
med.out.hostzero <- mediate(model.m = med.fit,
                   model.y = out.fit,
                     treat = "host_zero",
                     mediator = "log10(Final_reads)",
                     sims = 10000)

out.sum <- summary(med.out.hostzero)

final.out.host_zero <- data.frame(`Indirect est.`=out.sum$d.avg, 
                          `Indirect p-value`=out.sum$d.avg.p,
                          `Direct est.`=out.sum$z.avg, 
                          `Direct p-value`=out.sum$z.avg.p, 
                          `Total est.`=out.sum$tau.coef, 
                          `Total p-value`=out.sum$tau.p,
                          `Proportion mediation est.`=out.sum$n.avg, 
                          `Proportion mediation p-value`=out.sum$n.avg.p,
                          check.names = F) %>% 
    mutate_all(~round(., 3))


## molysis


med.fit <- lmer(log10(Final_reads) ~ molysis +  sample_type + (1|subject_id),
                data = subset(sample_data_respiratory, sample_data_respiratory$control == 1 | sample_data_respiratory$molysis == 1))

out.fit <- lmer(S.obs ~ molysis * log10(Final_reads) +  sample_type + (1|subject_id), 
                data = subset(sample_data_respiratory, sample_data_respiratory$control == 1 | sample_data_respiratory$molysis == 1))
set.seed(seed)
med.out.molysis <- mediate(model.m = med.fit,
                   model.y = out.fit,
                     treat = "molysis",
                     mediator = "log10(Final_reads)",
                     sims = 10000)

out.sum <- summary(med.out.molysis)

final.out.molysis <- data.frame(`Indirect est.`=out.sum$d.avg, 
                          `Indirect p-value`=out.sum$d.avg.p,
                          `Direct est.`=out.sum$z.avg, 
                          `Direct p-value`=out.sum$z.avg.p, 
                          `Total est.`=out.sum$tau.coef, 
                          `Total p-value`=out.sum$tau.p,
                          `Proportion mediation est.`=out.sum$n.avg, 
                          `Proportion mediation p-value`=out.sum$n.avg.p,
                          check.names = F) %>% 
    mutate_all(~round(., 3))

## qiaamp


med.fit <- lmer(log10(Final_reads) ~ qiaamp +  sample_type + (1|subject_id),
                data = subset(sample_data_respiratory, sample_data_respiratory$control == 1 | sample_data_respiratory$qiaamp == 1))

out.fit <- lmer(S.obs ~ qiaamp * log10(Final_reads) + sample_type + (1|subject_id), 
                data = subset(sample_data_respiratory, sample_data_respiratory$control == 1 | sample_data_respiratory$qiaamp == 1))

set.seed(seed)
med.out_qiaamp <- mediate(model.m = med.fit,
                   model.y = out.fit,
                     treat = "qiaamp",
                     mediator = "log10(Final_reads)",
                     sims = 10000)

out.sum <- summary(med.out_qiaamp)

final.out.qiaamp <- data.frame(`Indirect est.`=out.sum$d.avg, 
                          `Indirect p-value`=out.sum$d.avg.p,
                          `Direct est.`=out.sum$z.avg, 
                          `Direct p-value`=out.sum$z.avg.p, 
                          `Total est.`=out.sum$tau.coef, 
                          `Total p-value`=out.sum$tau.p,
                          `Proportion mediation est.`=out.sum$n.avg, 
                          `Proportion mediation p-value`=out.sum$n.avg.p,
                          check.names = F) %>% 
    mutate_all(~round(., 3))

Raw mediation outputs - lyPMA

med.out.lypma %>% summary
## 
## Causal Mediation Analysis 
## 
## Quasi-Bayesian Confidence Intervals
## 
## Mediator Groups: subject_id 
## 
## Outcome Groups: subject_id 
## 
## Output Based on Overall Averages Across Groups 
## 
##                          Estimate 95% CI Lower 95% CI Upper p-value  
## ACME (control)             0.5058      -3.1990         4.95   0.781  
## ACME (treated)             1.6851      -9.3365        13.54   0.759  
## ADE (control)             11.3959       0.3340        22.95   0.043 *
## ADE (treated)             12.5751       1.2206        24.90   0.028 *
## Total Effect              13.0810       0.3890        26.65   0.043 *
## Prop. Mediated (control)   0.0214      -0.7431         0.46   0.756  
## Prop. Mediated (treated)   0.1321      -2.3171         1.17   0.729  
## ACME (average)             1.0955      -5.9701         8.64   0.759  
## ADE (average)             11.9855       1.2858        22.91   0.027 *
## Prop. Mediated (average)   0.0768      -1.5381         0.77   0.729  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Sample Size Used: 34 
## 
## 
## Simulations: 10000

Raw mediation outputs - Benzonase

med.out.benzonase %>% summary
## 
## Causal Mediation Analysis 
## 
## Quasi-Bayesian Confidence Intervals
## 
## Mediator Groups: subject_id 
## 
## Outcome Groups: subject_id 
## 
## Output Based on Overall Averages Across Groups 
## 
##                          Estimate 95% CI Lower 95% CI Upper p-value   
## ACME (control)             8.3239      -2.1981        21.41  0.1332   
## ACME (treated)            14.6853      -0.6295        34.13  0.0630 . 
## ADE (control)              8.7025     -11.0395        27.78  0.3824   
## ADE (treated)             15.0638      -3.8177        34.70  0.1208   
## Total Effect              23.3877       7.6197        40.06  0.0038 **
## Prop. Mediated (control)   0.3463      -0.1061         1.32  0.1370   
## Prop. Mediated (treated)   0.6117      -0.0421         1.91  0.0668 . 
## ACME (average)            11.5046       1.3646        24.40  0.0224 * 
## ADE (average)             11.8832      -5.1716        29.09  0.1738   
## Prop. Mediated (average)   0.4790       0.0584         1.46  0.0262 * 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Sample Size Used: 35 
## 
## 
## Simulations: 10000

Raw mediation outputs - host zero

med.out.hostzero %>% summary
## 
## Causal Mediation Analysis 
## 
## Quasi-Bayesian Confidence Intervals
## 
## Mediator Groups: subject_id 
## 
## Outcome Groups: subject_id 
## 
## Output Based on Overall Averages Across Groups 
## 
##                          Estimate 95% CI Lower 95% CI Upper p-value    
## ACME (control)             24.110       -1.084        52.51  0.0626 .  
## ACME (treated)             34.365       10.979        62.01  0.0030 ** 
## ADE (control)               4.994      -22.338        31.79  0.7176    
## ADE (treated)              15.249      -16.203        46.36  0.3390    
## Total Effect               39.359       20.836        58.64  0.0002 ***
## Prop. Mediated (control)    0.608       -0.028         1.57  0.0628 .  
## Prop. Mediated (treated)    0.868        0.277         1.79  0.0032 ** 
## ACME (average)             29.238        9.775        51.95  0.0014 ** 
## ADE (average)              10.121      -14.700        35.03  0.4274    
## Prop. Mediated (average)    0.738        0.243         1.57  0.0016 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Sample Size Used: 35 
## 
## 
## Simulations: 10000

Raw mediation outputs - MolYsis

med.out.molysis %>% summary
## 
## Causal Mediation Analysis 
## 
## Quasi-Bayesian Confidence Intervals
## 
## Mediator Groups: subject_id 
## 
## Outcome Groups: subject_id 
## 
## Output Based on Overall Averages Across Groups 
## 
##                          Estimate 95% CI Lower 95% CI Upper p-value    
## ACME (control)            16.6842      -1.6638        39.14   0.077 .  
## ACME (treated)            36.1230      17.3570        59.33  <2e-16 ***
## ADE (control)              8.9818     -11.1781        28.19   0.364    
## ADE (treated)             28.4205       4.3635        55.21   0.022 *  
## Total Effect              45.1047      26.3931        64.04  <2e-16 ***
## Prop. Mediated (control)   0.3659      -0.0373         0.88   0.077 .  
## Prop. Mediated (treated)   0.7951       0.4294         1.31  <2e-16 ***
## ACME (average)            26.4036      11.8607        44.49  <2e-16 ***
## ADE (average)             18.7011       0.4694        37.76   0.041 *  
## Prop. Mediated (average)   0.5805       0.2833         0.99  <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Sample Size Used: 35 
## 
## 
## Simulations: 10000

Raw mediation outputs - qiaamp

med.out_qiaamp %>% summary
## 
## Causal Mediation Analysis 
## 
## Quasi-Bayesian Confidence Intervals
## 
## Mediator Groups: subject_id 
## 
## Outcome Groups: subject_id 
## 
## Output Based on Overall Averages Across Groups 
## 
##                          Estimate 95% CI Lower 95% CI Upper p-value    
## ACME (control)            22.3244      -2.1803        49.12   0.077 .  
## ACME (treated)            20.2975      -4.2164        46.66   0.111    
## ADE (control)             12.0883     -15.5013        39.45   0.398    
## ADE (treated)             10.0614     -19.4275        38.73   0.506    
## Total Effect              32.3858      17.1994        48.04  <2e-16 ***
## Prop. Mediated (control)   0.6853      -0.0724         1.85   0.077 .  
## Prop. Mediated (treated)   0.6266      -0.1365         1.64   0.111    
## ACME (average)            21.3110       0.8868        43.47   0.039 *  
## ADE (average)             11.0748     -13.2950        35.73   0.383    
## Prop. Mediated (average)   0.6559       0.0266         1.61   0.039 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Sample Size Used: 35 
## 
## 
## Simulations: 10000

Tidy table of mediation analysis result

tableS4_a <- rbind(final.out.lypma,
      final.out.benzonase,
      final.out.host_zero, 
      final.out.molysis,
      final.out.qiaamp) %>%
mutate(Treatment = c("lyPMA", "Benzonase", "HostZERO", "MolYsis", "QIAamp"), .before = "Indirect est.") 

tableS4_a

Mediation analysis for function

outcome = S.obs (with phyloseq$) exposure = treatment (stratified) mediator = Final_reads mediator-outcome confounders = sample_type exposure-mediator confounders = NA outcome model = Mixed effects linear regression mediator model = Mixed effects linear regression

Mediation analysis was conducted stratified treatment.

Table S4. Mediation analysis results in estimated effect sizes and p-values of indirect effect, direct effect, and proportion of mediation. The analysis employed treatment as exposure, log10(final reads) as mediator, species richness as outcome, and sample type as mediator-exposure confounder. Analysis was stratified by each treatment and treated as binary variables.

detach_package(lmerTest)

## all treatment groups

sample_data_respiratory_path <- subset(phyloseq$phyloseq_path_rpk %>% sample_data %>% data.frame, sample_data$sample_type == "Sputum" | sample_data$sample_type == "BAL" | sample_data$sample_type == "Nasal")


med.fit <- lmer(log10(Final_reads) ~ lypma +  sample_type + (1|subject_id),
                data = subset(sample_data_respiratory_path, sample_data_respiratory_path$control == 1 | sample_data_respiratory_path$lypma == 1) %>% subset(., !is.na(.$S.obs)))

out.fit <- lmer(S.obs ~ lypma * log10(Final_reads) + sample_type + (1|subject_id), 
                data = subset(sample_data_respiratory_path, sample_data_respiratory_path$control == 1 | sample_data_respiratory_path$lypma == 1) %>% subset(., !is.na(.$S.obs)))
set.seed(seed)

med.out.lypma <- mediate(model.m = med.fit,
                   model.y = out.fit,
                     treat = "lypma",
                     mediator = "log10(Final_reads)",
                     sims = 10000)

out.sum <- summary(med.out.lypma)

final.out.lypma <- data.frame(`Indirect est.`=out.sum$d.avg, 
                          `Indirect p-value`=out.sum$d.avg.p,
                          `Direct est.`=out.sum$z.avg, 
                          `Direct p-value`=out.sum$z.avg.p, 
                          `Total est.`=out.sum$tau.coef, 
                          `Total p-value`=out.sum$tau.p,
                          `Proportion mediation est.`=out.sum$n.avg, 
                          `Proportion mediation p-value`=out.sum$n.avg.p,
                          check.names = F) %>% 
    mutate_all(~round(., 3))

## benzonase


med.fit <- lmer(log10(Final_reads) ~ benzonase + sample_type + (1|subject_id),
                data = subset(sample_data_respiratory_path, sample_data_respiratory_path$control == 1 | sample_data_respiratory_path$benzonase == 1) %>% subset(., !is.na(.$S.obs)))

out.fit <- lmer(S.obs ~ benzonase * log10(Final_reads) +  sample_type + (1|subject_id), 
                data = subset(sample_data_respiratory_path, sample_data_respiratory_path$control == 1 | sample_data_respiratory_path$benzonase == 1) %>% subset(., !is.na(.$S.obs)))

set.seed(seed)
med.out.benzonase <- mediate(model.m = med.fit,
                   model.y = out.fit,
                     treat = "benzonase",
                     mediator = "log10(Final_reads)",
                     sims = 10000)

out.sum <- summary(med.out.benzonase)

final.out.benzonase <- data.frame(`Indirect est.`=out.sum$d.avg, 
                          `Indirect p-value`=out.sum$d.avg.p,
                          `Direct est.`=out.sum$z.avg, 
                          `Direct p-value`=out.sum$z.avg.p, 
                          `Total est.`=out.sum$tau.coef, 
                          `Total p-value`=out.sum$tau.p,
                          `Proportion mediation est.`=out.sum$n.avg, 
                          `Proportion mediation p-value`=out.sum$n.avg.p,
                          check.names = F) %>% 
    mutate_all(~round(., 3))

## host zero


med.fit <- lmer(log10(Final_reads) ~ host_zero +  sample_type + (1|subject_id),
                data = subset(sample_data_respiratory_path, sample_data_respiratory_path$control == 1 | sample_data_respiratory_path$host_zero == 1) %>% subset(., !is.na(.$S.obs)))

out.fit <- lmer(S.obs ~ host_zero * log10(Final_reads) +  sample_type + (1|subject_id), 
                data = subset(sample_data_respiratory_path, sample_data_respiratory_path$control == 1 | sample_data_respiratory_path$host_zero == 1) %>% subset(., !is.na(.$S.obs)))
set.seed(seed)
med.out.hostzero <- mediate(model.m = med.fit,
                   model.y = out.fit,
                     treat = "host_zero",
                     mediator = "log10(Final_reads)",
                     sims = 10000)

out.sum <- summary(med.out.hostzero)

final.out.host_zero <- data.frame(`Indirect est.`=out.sum$d.avg, 
                          `Indirect p-value`=out.sum$d.avg.p,
                          `Direct est.`=out.sum$z.avg, 
                          `Direct p-value`=out.sum$z.avg.p, 
                          `Total est.`=out.sum$tau.coef, 
                          `Total p-value`=out.sum$tau.p,
                          `Proportion mediation est.`=out.sum$n.avg, 
                          `Proportion mediation p-value`=out.sum$n.avg.p,
                          check.names = F) %>% 
    mutate_all(~round(., 3))


## molysis


med.fit <- lmer(log10(Final_reads) ~ molysis +  sample_type + (1|subject_id),
                data = subset(sample_data_respiratory_path, sample_data_respiratory_path$control == 1 | sample_data_respiratory_path$molysis == 1) %>% subset(., !is.na(.$S.obs)))

out.fit <- lmer(S.obs ~ molysis * log10(Final_reads) +  sample_type + (1|subject_id), 
                data = subset(sample_data_respiratory_path, sample_data_respiratory_path$control == 1 | sample_data_respiratory_path$molysis == 1) %>% subset(., !is.na(.$S.obs)))
set.seed(seed)
med.out.molysis <- mediate(model.m = med.fit,
                   model.y = out.fit,
                     treat = "molysis",
                     mediator = "log10(Final_reads)",
                     sims = 10000)

out.sum <- summary(med.out.molysis)

final.out.molysis <- data.frame(`Indirect est.`=out.sum$d.avg, 
                          `Indirect p-value`=out.sum$d.avg.p,
                          `Direct est.`=out.sum$z.avg, 
                          `Direct p-value`=out.sum$z.avg.p, 
                          `Total est.`=out.sum$tau.coef, 
                          `Total p-value`=out.sum$tau.p,
                          `Proportion mediation est.`=out.sum$n.avg, 
                          `Proportion mediation p-value`=out.sum$n.avg.p,
                          check.names = F) %>% 
    mutate_all(~round(., 3))

## qiaamp


med.fit <- lmer(log10(Final_reads) ~ qiaamp +  sample_type + (1|subject_id),
                data = subset(sample_data_respiratory_path, sample_data_respiratory_path$control == 1 | sample_data_respiratory_path$qiaamp == 1) %>% subset(., !is.na(.$S.obs)))

out.fit <- lmer(S.obs ~ qiaamp * log10(Final_reads) + sample_type + (1|subject_id), 
                data = subset(sample_data_respiratory_path, sample_data_respiratory_path$control == 1 | sample_data_respiratory_path$qiaamp == 1) %>% subset(., !is.na(.$S.obs)))

set.seed(seed)
med.out_qiaamp <- mediate(model.m = med.fit,
                   model.y = out.fit,
                     treat = "qiaamp",
                     mediator = "log10(Final_reads)",
                     sims = 10000)

out.sum <- summary(med.out_qiaamp)

final.out.qiaamp <- data.frame(`Indirect est.`=out.sum$d.avg, 
                          `Indirect p-value`=out.sum$d.avg.p,
                          `Direct est.`=out.sum$z.avg, 
                          `Direct p-value`=out.sum$z.avg.p, 
                          `Total est.`=out.sum$tau.coef, 
                          `Total p-value`=out.sum$tau.p,
                          `Proportion mediation est.`=out.sum$n.avg, 
                          `Proportion mediation p-value`=out.sum$n.avg.p,
                          check.names = F) %>% 
    mutate_all(~round(., 3))

Raw mediation outputs - lyPMA

med.out.lypma %>% summary
## 
## Causal Mediation Analysis 
## 
## Quasi-Bayesian Confidence Intervals
## 
## Mediator Groups: subject_id 
## 
## Outcome Groups: subject_id 
## 
## Output Based on Overall Averages Across Groups 
## 
##                          Estimate 95% CI Lower 95% CI Upper p-value   
## ACME (control)             3.6749      -9.1683        21.04  0.5796   
## ACME (treated)            10.6970     -24.4128        49.74  0.5432   
## ADE (control)             46.5096       9.7044        84.73  0.0160 * 
## ADE (treated)             53.5317      15.7030        95.60  0.0072 **
## Total Effect              57.2066      13.7040       103.12  0.0116 * 
## Prop. Mediated (control)   0.0444      -0.3516         0.34  0.5720   
## Prop. Mediated (treated)   0.1774      -0.9425         0.77  0.5352   
## ACME (average)             7.1859     -16.4883        32.98  0.5426   
## ADE (average)             50.0207      14.5112        87.35  0.0074 **
## Prop. Mediated (average)   0.1109      -0.6557         0.51  0.5346   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Sample Size Used: 34 
## 
## 
## Simulations: 10000

Raw mediation outputs - Benzonase

med.out.benzonase %>% summary
## 
## Causal Mediation Analysis 
## 
## Quasi-Bayesian Confidence Intervals
## 
## Mediator Groups: subject_id 
## 
## Outcome Groups: subject_id 
## 
## Output Based on Overall Averages Across Groups 
## 
##                          Estimate 95% CI Lower 95% CI Upper p-value    
## ACME (control)            45.2255      17.0243        82.33  0.0004 ***
## ACME (treated)            29.9684      -7.0963        74.74  0.1222    
## ADE (control)             46.3120       1.0350        91.62  0.0456 *  
## ADE (treated)             31.0549      -8.4013        69.79  0.1204    
## Total Effect              76.2804      42.7108       109.74  <2e-16 ***
## Prop. Mediated (control)   0.5866       0.2212         1.16  0.0004 ***
## Prop. Mediated (treated)   0.3868      -0.0966         0.98  0.1222    
## ACME (average)            37.5970      10.4486        72.89  0.0062 ** 
## ADE (average)             38.6835       1.1376        76.87  0.0436 *  
## Prop. Mediated (average)   0.4867       0.1372         0.98  0.0062 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Sample Size Used: 35 
## 
## 
## Simulations: 10000

Raw mediation outputs - host zero

med.out.hostzero %>% summary
## 
## Causal Mediation Analysis 
## 
## Quasi-Bayesian Confidence Intervals
## 
## Mediator Groups: subject_id 
## 
## Outcome Groups: subject_id 
## 
## Output Based on Overall Averages Across Groups 
## 
##                          Estimate 95% CI Lower 95% CI Upper p-value    
## ACME (control)             85.341       36.141       143.39  0.0006 ***
## ACME (treated)             28.353      -13.170        74.36  0.1882    
## ADE (control)              94.030       43.662       144.84  0.0004 ***
## ADE (treated)              37.042      -21.360        93.60  0.2090    
## Total Effect              122.383       89.106       154.94  <2e-16 ***
## Prop. Mediated (control)    0.692        0.289         1.20  0.0006 ***
## Prop. Mediated (treated)    0.230       -0.112         0.60  0.1882    
## ACME (average)             56.847       19.573       100.80  0.0032 ** 
## ADE (average)              65.536       19.159       112.00  0.0072 ** 
## Prop. Mediated (average)    0.461        0.158         0.82  0.0032 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Sample Size Used: 35 
## 
## 
## Simulations: 10000

Raw mediation outputs - MolYsis

med.out.molysis %>% summary
## 
## Causal Mediation Analysis 
## 
## Quasi-Bayesian Confidence Intervals
## 
## Mediator Groups: subject_id 
## 
## Outcome Groups: subject_id 
## 
## Output Based on Overall Averages Across Groups 
## 
##                           Estimate 95% CI Lower 95% CI Upper p-value    
## ACME (control)            77.69495     27.37358       139.74  0.0010 ***
## ACME (treated)            34.40433      1.04967        74.80  0.0432 *  
## ADE (control)             90.56676     45.36088       140.04  <2e-16 ***
## ADE (treated)             47.27614    -14.33392       105.50  0.1248    
## Total Effect             124.97109     84.62700       166.89  <2e-16 ***
## Prop. Mediated (control)   0.61178      0.22487         1.14  0.0010 ***
## Prop. Mediated (treated)   0.27028      0.00817         0.58  0.0432 *  
## ACME (average)            56.04964     23.00149        96.45  <2e-16 ***
## ADE (average)             68.92145     23.70209       113.47  0.0032 ** 
## Prop. Mediated (average)   0.44103      0.19069         0.77  <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Sample Size Used: 35 
## 
## 
## Simulations: 10000

Raw mediation outputs - qiaamp

med.out_qiaamp %>% summary
## 
## Causal Mediation Analysis 
## 
## Quasi-Bayesian Confidence Intervals
## 
## Mediator Groups: subject_id 
## 
## Outcome Groups: subject_id 
## 
## Output Based on Overall Averages Across Groups 
## 
##                          Estimate 95% CI Lower 95% CI Upper p-value    
## ACME (control)             89.667       28.274       159.25   0.003 ** 
## ACME (treated)             77.261       20.186       141.85   0.008 ** 
## ADE (control)              35.405      -27.414        98.75   0.264    
## ADE (treated)              22.999      -46.124        92.26   0.511    
## Total Effect              112.666       75.032       149.76  <2e-16 ***
## Prop. Mediated (control)    0.793        0.246         1.50   0.003 ** 
## Prop. Mediated (treated)    0.686        0.186         1.29   0.008 ** 
## ACME (average)             83.464       32.645       141.73   0.001 ***
## ADE (average)              29.202      -28.441        87.15   0.320    
## Prop. Mediated (average)    0.740        0.284         1.31   0.001 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Sample Size Used: 34 
## 
## 
## Simulations: 10000

Tidy table of mediation analysis result (for functional richness)

tableS4_b <- rbind(final.out.lypma,
      final.out.benzonase,
      final.out.host_zero, 
      final.out.molysis,
      final.out.qiaamp) %>%
mutate(Treatment = c("lyPMA", "Benzonase", "HostZERO", "MolYsis", "QIAamp"), .before = "Indirect est.") 

tableS4_b

Table S4. Mediation analysis for viruses

outcome = V.obs exposure = treatment (stratified) mediator = Final_reads mediator-outcome confounders = sample_type exposure-mediator confounders = NA outcome model = Mixed effects linear regression mediator model = Mixed effects linear regression

Mediation analysis was conducted stratified treatment.

Table S4. Mediation analysis results in estimated effect sizes and p-values of indirect effect, direct effect, and proportion of mediation. The analysis employed treatment as exposure, log10(final reads) as mediator, species richness as outcome, and sample type as mediator-exposure confounder. Analysis was stratified by each treatment and treated as binary variables.

## lypma
# only mediator-outcome confounders


detach_package(lmerTest) #lmerTest induces error

## all treatment groups

sample_data_respiratory <- subset(phyloseq_unfiltered$phyloseq_count %>% 
        subset_samples(#S.obs != 0 & 
                               sample_type %in% c("Mock", 
                                                       "BAL", "Nasal", "Sputum")) %>% sample_data %>% data.frame, 
                                  sample_data$sample_type == "Sputum" | sample_data$sample_type == "BAL" | sample_data$sample_type == "Nasal") %>%
        subset(., !is.na(.$V.obs))


med.fit <- lmer(log10(Final_reads) ~ lypma +  sample_type + (1|subject_id),
                data = subset(sample_data_respiratory, sample_data_respiratory$control == 1 | sample_data_respiratory$lypma == 1))

out.fit <- lmer(V.obs ~ lypma * log10(Final_reads) + sample_type + (1|subject_id), 
                data = subset(sample_data_respiratory, sample_data_respiratory$control == 1 | sample_data_respiratory$lypma == 1))
set.seed(seed)
med.out.lypma_v <- mediate(model.m = med.fit,
                   model.y = out.fit,
                     treat = "lypma",
                     mediator = "log10(Final_reads)",
                     sims = 10000)

out.sum <- summary(med.out.lypma_v)

final.out.lypma_v <- data.frame(`Indirect est.`=out.sum$d.avg, 
                          `Indirect p-value`=out.sum$d.avg.p,
                          `Direct est.`=out.sum$z.avg, 
                          `Direct p-value`=out.sum$z.avg.p, 
                          `Total est.`=out.sum$tau.coef, 
                          `Total p-value`=out.sum$tau.p,
                          `Proportion mediation est.`=out.sum$n.avg, 
                          `Proportion mediation p-value`=out.sum$n.avg.p,
                          check.names = F) %>% 
    mutate_all(~round(., 3))

## benzonase


med.fit <- lmer(log10(Final_reads) ~ benzonase + sample_type + (1|subject_id),
                data = subset(sample_data_respiratory, sample_data_respiratory$control == 1 | sample_data_respiratory$benzonase == 1))

out.fit <- lmer(V.obs ~ benzonase * log10(Final_reads) +  sample_type + (1|subject_id), 
                data = subset(sample_data_respiratory, sample_data_respiratory$control == 1 | sample_data_respiratory$benzonase == 1))

set.seed(seed)
med.out.benzonase_v <- mediate(model.m = med.fit,
                   model.y = out.fit,
                     treat = "benzonase",
                     mediator = "log10(Final_reads)",
                     sims = 10000)

out.sum <- summary(med.out.benzonase_v)

final.out.benzonase_v <- data.frame(`Indirect est.`=out.sum$d.avg, 
                          `Indirect p-value`=out.sum$d.avg.p,
                          `Direct est.`=out.sum$z.avg, 
                          `Direct p-value`=out.sum$z.avg.p, 
                          `Total est.`=out.sum$tau.coef, 
                          `Total p-value`=out.sum$tau.p,
                          `Proportion mediation est.`=out.sum$n.avg, 
                          `Proportion mediation p-value`=out.sum$n.avg.p,
                          check.names = F) %>% 
    mutate_all(~round(., 3))

## host zero


med.fit <- lmer(log10(Final_reads) ~ host_zero +  sample_type + (1|subject_id),
                data = subset(sample_data_respiratory, sample_data_respiratory$control == 1 | sample_data_respiratory$host_zero == 1))

out.fit <- lmer(V.obs ~ host_zero * log10(Final_reads) +  sample_type + (1|subject_id), 
                data = subset(sample_data_respiratory, sample_data_respiratory$control == 1 | sample_data_respiratory$host_zero == 1))
set.seed(seed)
med.out.hostzero_v <- mediate(model.m = med.fit,
                   model.y = out.fit,
                     treat = "host_zero",
                     mediator = "log10(Final_reads)",
                     sims = 10000)

out.sum <- summary(med.out.hostzero_v)

final.out.host_zero_v <- data.frame(`Indirect est.`=out.sum$d.avg, 
                          `Indirect p-value`=out.sum$d.avg.p,
                          `Direct est.`=out.sum$z.avg, 
                          `Direct p-value`=out.sum$z.avg.p, 
                          `Total est.`=out.sum$tau.coef, 
                          `Total p-value`=out.sum$tau.p,
                          `Proportion mediation est.`=out.sum$n.avg, 
                          `Proportion mediation p-value`=out.sum$n.avg.p,
                          check.names = F) %>% 
    mutate_all(~round(., 3))


## molysis


med.fit <- lmer(log10(Final_reads) ~ molysis +  sample_type + (1|subject_id),
                data = subset(sample_data_respiratory, sample_data_respiratory$control == 1 | sample_data_respiratory$molysis == 1))

out.fit <- lmer(V.obs ~ molysis * log10(Final_reads) +  sample_type + (1|subject_id), 
                data = subset(sample_data_respiratory, sample_data_respiratory$control == 1 | sample_data_respiratory$molysis == 1))
set.seed(seed)
med.out.molysis_v <- mediate(model.m = med.fit,
                   model.y = out.fit,
                     treat = "molysis",
                     mediator = "log10(Final_reads)",
                     sims = 10000)

out.sum <- summary(med.out.molysis_v)

final.out.molysis_v <- data.frame(`Indirect est.`=out.sum$d.avg, 
                          `Indirect p-value`=out.sum$d.avg.p,
                          `Direct est.`=out.sum$z.avg, 
                          `Direct p-value`=out.sum$z.avg.p, 
                          `Total est.`=out.sum$tau.coef, 
                          `Total p-value`=out.sum$tau.p,
                          `Proportion mediation est.`=out.sum$n.avg, 
                          `Proportion mediation p-value`=out.sum$n.avg.p,
                          check.names = F) %>% 
    mutate_all(~round(., 3))

## qiaamp


med.fit <- lmer(log10(Final_reads) ~ qiaamp +  sample_type + (1|subject_id),
                data = subset(sample_data_respiratory, sample_data_respiratory$control == 1 | sample_data_respiratory$qiaamp == 1))

out.fit <- lmer(V.obs ~ qiaamp * log10(Final_reads) + sample_type + (1|subject_id), 
                data = subset(sample_data_respiratory, sample_data_respiratory$control == 1 | sample_data_respiratory$qiaamp == 1))

set.seed(seed)
med.out_qiaamp_v <- mediate(model.m = med.fit,
                   model.y = out.fit,
                     treat = "qiaamp",
                     mediator = "log10(Final_reads)",
                     sims = 10000)

out.sum <- summary(med.out_qiaamp_v)

final.out.qiaamp_v <- data.frame(`Indirect est.`=out.sum$d.avg, 
                          `Indirect p-value`=out.sum$d.avg.p,
                          `Direct est.`=out.sum$z.avg, 
                          `Direct p-value`=out.sum$z.avg.p, 
                          `Total est.`=out.sum$tau.coef, 
                          `Total p-value`=out.sum$tau.p,
                          `Proportion mediation est.`=out.sum$n.avg, 
                          `Proportion mediation p-value`=out.sum$n.avg.p,
                          check.names = F) %>% 
    mutate_all(~round(., 3))

Raw mediation outputs - lyPMA

med.out.lypma_v %>% summary
## 
## Causal Mediation Analysis 
## 
## Quasi-Bayesian Confidence Intervals
## 
## Mediator Groups: subject_id 
## 
## Outcome Groups: subject_id 
## 
## Output Based on Overall Averages Across Groups 
## 
##                          Estimate 95% CI Lower 95% CI Upper p-value
## ACME (control)            0.36552     -2.20995         3.11    0.77
## ACME (treated)            0.43085     -2.66639         3.83    0.77
## ADE (control)             0.00651     -3.60580         3.84    1.00
## ADE (treated)             0.07185     -3.57574         3.97    0.99
## Total Effect              0.43736     -4.10180         5.25    0.87
## Prop. Mediated (control)  0.28714     -5.44982         5.15    0.61
## Prop. Mediated (treated)  0.35765     -5.99530         6.05    0.61
## ACME (average)            0.39819     -2.38467         3.27    0.77
## ADE (average)             0.03918     -3.51113         3.79    0.99
## Prop. Mediated (average)  0.32239     -5.78991         5.48    0.61
## 
## Sample Size Used: 34 
## 
## 
## Simulations: 10000

Raw mediation outputs - Benzonase

med.out.benzonase_v %>% summary
## 
## Causal Mediation Analysis 
## 
## Quasi-Bayesian Confidence Intervals
## 
## Mediator Groups: subject_id 
## 
## Outcome Groups: subject_id 
## 
## Output Based on Overall Averages Across Groups 
## 
##                          Estimate 95% CI Lower 95% CI Upper p-value   
## ACME (control)             6.2073       1.3430        12.58  0.0084 **
## ACME (treated)             7.0702       0.3184        15.66  0.0392 * 
## ADE (control)              0.1338      -8.9771         8.95  0.9776   
## ADE (treated)              0.9968      -7.0311         9.17  0.8114   
## Total Effect               7.2040      -0.0368        14.46  0.0510 . 
## Prop. Mediated (control)   0.8296      -0.6458         4.48  0.0594 . 
## Prop. Mediated (treated)   0.9435      -0.7582         4.99  0.0862 . 
## ACME (average)             6.6387       1.9651        12.82  0.0042 **
## ADE (average)              0.5653      -7.1233         8.11  0.8794   
## Prop. Mediated (average)   0.8865      -0.5962         4.65  0.0552 . 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Sample Size Used: 34 
## 
## 
## Simulations: 10000

Raw mediation outputs - host zero

med.out.hostzero_v %>% summary
## 
## Causal Mediation Analysis 
## 
## Quasi-Bayesian Confidence Intervals
## 
## Mediator Groups: subject_id 
## 
## Outcome Groups: subject_id 
## 
## Output Based on Overall Averages Across Groups 
## 
##                          Estimate 95% CI Lower 95% CI Upper p-value  
## ACME (control)             26.759       -6.716        63.97   0.118  
## ACME (treated)             37.689        7.249        72.76   0.014 *
## ADE (control)              -6.555      -43.588        29.55   0.723  
## ADE (treated)               4.375      -36.636        45.37   0.843  
## Total Effect               31.134        6.855        55.42   0.013 *
## Prop. Mediated (control)    0.848       -0.282         3.91   0.129  
## Prop. Mediated (treated)    1.203        0.187         4.69   0.028 *
## ACME (average)             32.224        6.937        61.78   0.012 *
## ADE (average)              -1.090      -34.544        32.22   0.958  
## Prop. Mediated (average)    1.025        0.164         4.04   0.025 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Sample Size Used: 34 
## 
## 
## Simulations: 10000

Raw mediation outputs - MolYsis

med.out.molysis_v %>% summary
## 
## Causal Mediation Analysis 
## 
## Quasi-Bayesian Confidence Intervals
## 
## Mediator Groups: subject_id 
## 
## Outcome Groups: subject_id 
## 
## Output Based on Overall Averages Across Groups 
## 
##                          Estimate 95% CI Lower 95% CI Upper p-value    
## ACME (control)             18.706       -4.596        46.80  0.1250    
## ACME (treated)             38.091       16.179        65.54  0.0002 ***
## ADE (control)               2.771      -23.259        28.08  0.8294    
## ADE (treated)              22.157       -8.813        56.07  0.1698    
## Total Effect               40.862       17.826        64.84  0.0004 ***
## Prop. Mediated (control)    0.455       -0.119         1.32  0.1254    
## Prop. Mediated (treated)    0.931        0.419         1.96  0.0006 ***
## ACME (average)             28.398       10.946        49.89  0.0002 ***
## ADE (average)              12.464      -11.556        37.28  0.3162    
## Prop. Mediated (average)    0.693        0.280         1.49  0.0006 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Sample Size Used: 34 
## 
## 
## Simulations: 10000

Raw mediation outputs - qiaamp

med.out_qiaamp_v %>% summary
## 
## Causal Mediation Analysis 
## 
## Quasi-Bayesian Confidence Intervals
## 
## Mediator Groups: subject_id 
## 
## Outcome Groups: subject_id 
## 
## Output Based on Overall Averages Across Groups 
## 
##                          Estimate 95% CI Lower 95% CI Upper p-value  
## ACME (control)             23.212       -1.092        49.91   0.063 .
## ACME (treated)             28.939        5.100        55.48   0.017 *
## ADE (control)              -8.694      -36.546        18.77   0.532  
## ADE (treated)              -2.967      -31.509        25.26   0.839  
## Total Effect               20.245        4.309        36.14   0.013 *
## Prop. Mediated (control)    1.138       -0.121         4.94   0.076 .
## Prop. Mediated (treated)    1.423        0.178         5.64   0.030 *
## ACME (average)             26.076        6.202        48.54   0.010 *
## ADE (average)              -5.831      -30.260        18.47   0.635  
## Prop. Mediated (average)    1.281        0.232         5.11   0.023 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Sample Size Used: 34 
## 
## 
## Simulations: 10000

Tidy table of mediation analysis result

tableS4_c <- rbind(final.out.lypma_v,
      final.out.benzonase_v,
      final.out.host_zero_v, 
      final.out.molysis_v,
      final.out.qiaamp_v) %>%
mutate(Treatment = c("lyPMA", "Benzonase", "HostZERO", "MolYsis", "QIAamp"), .before = "Indirect est.") 

tableS4_c

Mediation analsyis - combined table (Species and functional richness)

tables4 <- rbind (data.frame(Outcome = "Microbial species richness",
           tableS4_a),
       data.frame(Outcome = "Functional richness",
           tableS4_b)
       ) %>% 
        kbl(format = "html") %>%
        kable_styling(full_width = 0, html_font = "sans")

save_kable(tables4, file = "Project_SICAS2_microbiome/7_Manuscripts/2022_MGK_Host_Depletion/Figures/tableS4.html", self_contained = T)



tables4_updated <- rbind (data.frame(Outcome = "Microbial species richness",
           tableS4_a),
       data.frame(Outcome = "Functional richness",
           tableS4_b),
       data.frame(Outcome = "Viral species richness",
           tableS4_c)
       ) %>%
        kbl(format = "html") %>%
        kable_styling(full_width = 0, html_font = "sans")

save_kable(tables4_updated, file = "Project_SICAS2_microbiome/7_Manuscripts/2022_MGK_Host_Depletion/Figures/tableS4_updated.html", self_contained = T)

x. How were changes of individual taxa?

x. How were changes of individual taxa? Differential abundance analysis

    1.  Volcano plot with Mock, BAL, Nasal and Sputum (Figure S5)
            a.  Maaslin (feature ~ lyPMA + Benzonase + HostZERO + MolYsis + QIAamp, RE = subject_id)
                    i.  make sure that you are doing this analysis accounting for the compositional nature of this data
            b.  Or Maaslin ( feature ~ sample type + treatment + sample type * treatment, RE = subject_id) 
    2.  Balloon plots BAL, Nasal and Sputum. (Figure 4)
                    a.  Add q-val, mean relative abundance, gram strain, phylogenetic information
    3.  List of differentiallly abundant taxa by treatment method (Table S8)
                    a.  Subset of low q-val (<0.1) & high fold-change (|est.| > 1)
                    

DA taxa

Factors affecting DA taxa (main text)

setwd("Project_SICAS2_microbiome/5_Scripts/MGK/Host_depletion_git")

filt_maaslin_all <- read.csv("data/da_lmer_filt_maaslin_all.csv")
#filt_maaslin_interaction <- read.csv("data/da_lmer_filt_maaslin_interaction.csv")
filt_fit_data_bal <- read.csv("data/da_lmer_filt_fit_data_bal.csv")
filt_fit_data_spt <- read.csv("data/da_lmer_filt_fit_data_spt.csv")
filt_fit_data_ns <- read.csv("data/da_lmer_filt_fit_data_ns.csv")
#filt_fit_data_pos <- read.csv("data/da_lmer_filt_fit_data_pos.csv")

MaAsLin output (raw data)

filt_maaslin_all

MaAsLin output (stratified)

BAL stratified MaAsLin result

filt_fit_data_bal

Nasal stratified MaAsLin result

filt_fit_data_ns

Sputum stratified MaAsLin result

filt_fit_data_spt

Number of taxa on some conditions

These calculations were made to write the details in the main text

cat("Factors affecting DA taxa (q<0.1)")
## Factors affecting DA taxa (q<0.1)
filt_maaslin_all %>% subset(., .$qval < 0.1 ) %>% subset(., abs(.$Estimate) > 1) %>% .$metadata %>% table
## .
##   benzonase   host_zero       lypma     molysis      qiaamp sample_type 
##          27          75          25          92          57         145
filt_maaslin_all %>% subset(., .$qval < 0.1 ) %>% .$metadata %>% table
## .
##   benzonase   host_zero       lypma     molysis      qiaamp sample_type 
##          81          90          27          97          92         152
cat("Number of positive & significant changes")
## Number of positive & significant changes
filt_maaslin_all %>% subset(., .$qval < 0.1 ) %>% subset(., .$Estimate > 0) %>% .$feature %>% unique
##  [1] "Actinomyces_graevenitzii"            "Actinomyces_johnsonii"              
##  [3] "Actinomyces_massiliensis"            "Actinomyces_naeslundii"             
##  [5] "Actinomyces_odontolyticus"           "Actinomyces_oris"                   
##  [7] "Actinomyces_sp_HMSC035G02"           "Actinomyces_sp_ICM47"               
##  [9] "Actinomyces_sp_S6_Spd3"              "Actinomyces_sp_oral_taxon_170"      
## [11] "Actinomyces_sp_oral_taxon_180"       "Actinomyces_sp_oral_taxon_181"      
## [13] "Actinomyces_viscosus"                "Aeriscardovia_aeriphila"            
## [15] "Scardovia_wiggsiae"                  "Corynebacterium_accolens"           
## [17] "Corynebacterium_atypicum"            "Corynebacterium_durum"              
## [19] "Corynebacterium_pseudogenitalium"    "Rothia_aeria"                       
## [21] "Rothia_dentocariosa"                 "Rothia_mucilaginosa"                
## [23] "Cutibacterium_acnes"                 "Cutibacterium_granulosum"           
## [25] "Propionibacterium_humerusii"         "Propionibacterium_namnetense"       
## [27] "Pseudopropionibacterium_propionicum" "Atopobium_parvulum"                 
## [29] "Atopobium_rimae"                     "Olsenella_scatoligenes"             
## [31] "Collinsella_aerofaciens"             "Collinsella_intestinalis"           
## [33] "Collinsella_stercoris"               "X.Collinsella._massiliensis"        
## [35] "Slackia_exigua"                      "Slackia_isoflavoniconvertens"       
## [37] "Thermoleophilum_album"               "Porphyromonas_somerae"              
## [39] "Gemella_bergeri"                     "Gemella_haemolysans"                
## [41] "Gemella_morbillorum"                 "Gemella_sanguinis"                  
## [43] "Staphylococcus_argenteus"            "Staphylococcus_epidermidis"         
## [45] "Staphylococcus_schweitzeri"          "Abiotrophia_defectiva"              
## [47] "Dolosigranulum_pigrum"               "Granulicatella_adiacens"            
## [49] "Granulicatella_elegans"              "Streptococcus_anginosus"            
## [51] "Streptococcus_australis"             "Streptococcus_gordonii"             
## [53] "Streptococcus_infantis"              "Streptococcus_mitis"                
## [55] "Streptococcus_oralis"                "Streptococcus_parasanguinis"        
## [57] "Streptococcus_peroris"               "Streptococcus_pneumoniae"           
## [59] "Streptococcus_pseudopneumoniae"      "Streptococcus_salivarius"           
## [61] "Streptococcus_sanguinis"             "Streptococcus_sp_A12"               
## [63] "Streptococcus_sp_F0442"              "Streptococcus_sp_HPH0090"           
## [65] "Streptococcus_sp_M334"               "Streptococcus_vestibularis"         
## [67] "Eubacterium_brachy"                  "Eubacterium_infirmum"               
## [69] "Eubacterium_sulci"                   "Mogibacterium_diversum"             
## [71] "Lachnoanaerobaculum_saburreum"       "Oribacterium_sinus"                 
## [73] "Bulleidia_extructa"                  "Solobacterium_moorei"               
## [75] "Limnochorda_pilosa"                  "Veillonella_dispar"                 
## [77] "Veillonella_parvula"                 "Parvimonas_micra"                   
## [79] "Fusobacterium_nucleatum"             "Cupriavidus_sp"                     
## [81] "Sutterella_parvirubra"               "Pseudomonas_aeruginosa"             
## [83] "Malassezia_restricta"                "Neisseria_flavescens"               
## [85] "Stenotrophomonas_maltophilia"
cat("Number of negative & significant changes")
## Number of negative & significant changes
filt_maaslin_all %>% subset(., .$qval < 0.1 ) %>% subset(., .$Estimate < 0) %>% .$feature %>% unique
##   [1] "Actinomyces_georgiae"                
##   [2] "Actinomyces_hongkongensis"           
##   [3] "Actinomyces_sp_oral_taxon_414"       
##   [4] "Corynebacterium_accolens"            
##   [5] "Corynebacterium_atypicum"            
##   [6] "Corynebacterium_pseudodiphtheriticum"
##   [7] "Corynebacterium_pseudogenitalium"    
##   [8] "Corynebacterium_tuberculostearicum"  
##   [9] "Cutibacterium_acnes"                 
##  [10] "Cutibacterium_granulosum"            
##  [11] "Propionibacterium_humerusii"         
##  [12] "Propionibacterium_namnetense"        
##  [13] "Atopobium_deltae"                    
##  [14] "Atopobium_parvulum"                  
##  [15] "Asaccharobacter_celatus"             
##  [16] "Denitrobacterium_detoxificans"       
##  [17] "Enterorhabdus_caecimuris"            
##  [18] "Rubrobacter_radiotolerans"           
##  [19] "Porphyromonas_endodontalis"          
##  [20] "Prevotella_histicola"                
##  [21] "Prevotella_melaninogenica"           
##  [22] "Prevotella_oris"                     
##  [23] "Kouleothrix_aurantiaca"              
##  [24] "Hydrogenibacillus_schlegelii"        
##  [25] "Gemella_asaccharolytica"             
##  [26] "Brochothrix_campestris"              
##  [27] "Staphylococcus_epidermidis"          
##  [28] "Dolosigranulum_pigrum"               
##  [29] "Enterococcus_faecalis"               
##  [30] "Lactobacillus_rhamnosus"             
##  [31] "Streptococcus_anginosus"             
##  [32] "Streptococcus_parasanguinis"         
##  [33] "Eubacterium_infirmum"                
##  [34] "Oribacterium_parvum"                 
##  [35] "Oribacterium_sp_oral_taxon_078"      
##  [36] "Stomatobaculum_longum"               
##  [37] "Veillonella_atypica"                 
##  [38] "Veillonella_dispar"                  
##  [39] "Veillonella_infantium"               
##  [40] "Veillonella_sp_T11011_6"             
##  [41] "Anaerococcus_nagyae"                 
##  [42] "Finegoldia_magna"                    
##  [43] "Fusobacterium_nucleatum"             
##  [44] "Leptotrichia_wadei"                  
##  [45] "Gemmata_obscuriglobus"               
##  [46] "Paludisphaera_borealis"              
##  [47] "Achromobacter_xylosoxidans"          
##  [48] "Cupriavidus_sp"                      
##  [49] "Sutterella_parvirubra"               
##  [50] "Cardiobacterium_valvarum"            
##  [51] "Alkalilimnicola_ehrlichii"           
##  [52] "Escherichia_coli"                    
##  [53] "Thiohalorhabdus_denitrificans"       
##  [54] "Pseudomonas_aeruginosa"              
##  [55] "Acholeplasma_oculi"                  
##  [56] "Candida_albicans"                    
##  [57] "Candida_dubliniensis"                
##  [58] "Malassezia_restricta"                
##  [59] "Nakamurella_silvestris"              
##  [60] "Alloprevotella_tannerae"             
##  [61] "Prevotella_jejuni"                   
##  [62] "Prevotella_pallens"                  
##  [63] "Prevotella_sp_oral_taxon_306"        
##  [64] "Prevotella_veroralis"                
##  [65] "Tannerella_sp_oral_taxon_808"        
##  [66] "Bacillus_ginsengihumi"               
##  [67] "Listeria_floridensis"                
##  [68] "Agitococcus_lubricus"                
##  [69] "Lactobacillus_fermentum"             
##  [70] "Lactobacillus_gasseri"               
##  [71] "Sharpea_azabuensis"                  
##  [72] "Oribacterium_asaccharolyticum"       
##  [73] "Stenotrophomonas_maltophilia"        
##  [74] "Stenotrophomonas_pavanii"            
##  [75] "Stenotrophomonas_rhizophila"         
##  [76] "Bacillus_intestinalis"               
##  [77] "Listeria_innocua"                    
##  [78] "Listeria_monocytogenes"              
##  [79] "Staphylococcus_haemolyticus"         
##  [80] "Eggerthia_catenaformis"              
##  [81] "Leptotrichia_sp_oral_taxon_498"      
##  [82] "Rickettsia_felis"                    
##  [83] "Haemophilus_parainfluenzae"          
##  [84] "Candida_orthopsilosis"               
##  [85] "Nannocystis_exedens"                 
##  [86] "Campylobacter_concisus"              
##  [87] "Franconibacter_helveticus"           
##  [88] "Klebsiella_pneumoniae"               
##  [89] "Salmonella_enterica"                 
##  [90] "Superficieibacter_electus"           
##  [91] "Pseudomonas_fluorescens"             
##  [92] "Saccharomyces_cerevisiae"            
##  [93] "S._cerevisiae.x.S._kudriavzevii"     
##  [94] "Saccharomyces_kudriavzevii"          
##  [95] "Cryptococcus_gattii_VGI"             
##  [96] "Cryptococcus_gattii_VGII"            
##  [97] "Cryptococcus_gattii_VGIII"           
##  [98] "Cryptococcus_neoformans"             
##  [99] "Listeria_marthii"                    
## [100] "Erwinia_persicina"                   
## [101] "Pseudomonas_formosensis"             
## [102] "Pseudomonas_putida"
cat("BAL highly changed taxa (q<0.1 & abs(.$coef) >1)")
## BAL highly changed taxa (q<0.1 & abs(.$coef) >1)
filt_fit_data_bal %>% subset(., .$qval < 0.1 ) %>% subset(., abs(.$Estimate) >1 ) %>% .$metadata %>% table
## < table of extent 0 >
cat("Sputum highly changed taxa (q<0.1 & abs(.$coef) >1)")
## Sputum highly changed taxa (q<0.1 & abs(.$coef) >1)
filt_fit_data_spt %>% subset(., .$qval < 0.1 ) %>% subset(., abs(.$Estimate) >1 ) %>% .$metadata %>% table
## .
## benzonase host_zero     lypma   molysis    qiaamp 
##        86       101        81       102       111
cat("NS highly changed taxa (q<0.1 & abs(.$coef) >1)")
## NS highly changed taxa (q<0.1 & abs(.$coef) >1)
filt_fit_data_ns %>% subset(., .$qval < 0.1 ) %>% subset(., abs(.$Estimate) >1 ) %>% .$metadata %>% table
## .
## benzonase host_zero     lypma   molysis    qiaamp 
##         1         2         7         2         5
cat("NS taxa (q<0.1)")
## NS taxa (q<0.1)
filt_fit_data_bal %>% subset(., .$qval < 0.1 ) %>% .$feature %>% unique
## character(0)
cat("NS taxa (q<0.1)")
## NS taxa (q<0.1)
filt_fit_data_ns %>% subset(., .$qval < 0.1 ) %>% .$feature %>% unique
##  [1] "Actinomyces_odontolyticus"           
##  [2] "Actinomyces_sp_ICM47"                
##  [3] "Aeriscardovia_aeriphila"             
##  [4] "Corynebacterium_accolens"            
##  [5] "Corynebacterium_pseudodiphtheriticum"
##  [6] "Olsenella_scatoligenes"              
##  [7] "X.Collinsella._massiliensis"         
##  [8] "Rubrobacter_radiotolerans"           
##  [9] "Thermoleophilum_album"               
## [10] "Gemella_asaccharolytica"             
## [11] "Staphylococcus_aureus"               
## [12] "Staphylococcus_epidermidis"          
## [13] "Streptococcus_oralis"                
## [14] "Limnochorda_pilosa"                  
## [15] "Finegoldia_magna"                    
## [16] "Cupriavidus_sp"                      
## [17] "Sutterella_parvirubra"               
## [18] "Acholeplasma_oculi"                  
## [19] "Malassezia_restricta"
cat("Sputum taxa (q<0.1 & abs(.$coef) >1)")
## Sputum taxa (q<0.1 & abs(.$coef) >1)
filt_fit_data_spt %>% subset(., .$qval < 0.1 ) %>% .$feature %>% unique
##   [1] "Actinomyces_georgiae"                
##   [2] "Actinomyces_hongkongensis"           
##   [3] "Actinomyces_johnsonii"               
##   [4] "Actinomyces_massiliensis"            
##   [5] "Actinomyces_naeslundii"              
##   [6] "Actinomyces_odontolyticus"           
##   [7] "Actinomyces_oris"                    
##   [8] "Actinomyces_sp_HMSC035G02"           
##   [9] "Actinomyces_sp_HPA0247"              
##  [10] "Actinomyces_sp_ICM47"                
##  [11] "Actinomyces_sp_S6_Spd3"              
##  [12] "Actinomyces_sp_oral_taxon_170"       
##  [13] "Actinomyces_sp_oral_taxon_180"       
##  [14] "Actinomyces_sp_oral_taxon_181"       
##  [15] "Actinomyces_sp_oral_taxon_414"       
##  [16] "Aeriscardovia_aeriphila"             
##  [17] "Alloscardovia_omnicolens"            
##  [18] "Bifidobacterium_dentium"             
##  [19] "Scardovia_wiggsiae"                  
##  [20] "Corynebacterium_accolens"            
##  [21] "Corynebacterium_atypicum"            
##  [22] "Corynebacterium_durum"               
##  [23] "Corynebacterium_matruchotii"         
##  [24] "Corynebacterium_pseudodiphtheriticum"
##  [25] "Rothia_aeria"                        
##  [26] "Rothia_dentocariosa"                 
##  [27] "Cutibacterium_acnes"                 
##  [28] "Propionibacterium_acidifaciens"      
##  [29] "Pseudopropionibacterium_propionicum" 
##  [30] "Atopobium_deltae"                    
##  [31] "Atopobium_parvulum"                  
##  [32] "Atopobium_rimae"                     
##  [33] "Olsenella_scatoligenes"              
##  [34] "Olsenella_uli"                       
##  [35] "Collinsella_aerofaciens"             
##  [36] "Collinsella_stercoris"               
##  [37] "Enorma_massiliensis"                 
##  [38] "X.Collinsella._massiliensis"         
##  [39] "Asaccharobacter_celatus"             
##  [40] "Denitrobacterium_detoxificans"       
##  [41] "Enterorhabdus_caecimuris"            
##  [42] "Slackia_exigua"                      
##  [43] "Slackia_isoflavoniconvertens"        
##  [44] "Rubrobacter_radiotolerans"           
##  [45] "Porphyromonas_endodontalis"          
##  [46] "Porphyromonas_somerae"               
##  [47] "Prevotella_histicola"                
##  [48] "Prevotella_melaninogenica"           
##  [49] "Prevotella_oris"                     
##  [50] "Kouleothrix_aurantiaca"              
##  [51] "Hydrogenibacillus_schlegelii"        
##  [52] "Gemella_asaccharolytica"             
##  [53] "Gemella_bergeri"                     
##  [54] "Gemella_haemolysans"                 
##  [55] "Gemella_morbillorum"                 
##  [56] "Brochothrix_campestris"              
##  [57] "Staphylococcus_argenteus"            
##  [58] "Staphylococcus_epidermidis"          
##  [59] "Staphylococcus_schweitzeri"          
##  [60] "Abiotrophia_defectiva"               
##  [61] "Abiotrophia_sp_HMSC24B09"            
##  [62] "Dolosigranulum_pigrum"               
##  [63] "Granulicatella_adiacens"             
##  [64] "Granulicatella_elegans"              
##  [65] "Enterococcus_faecalis"               
##  [66] "Lactobacillus_rhamnosus"             
##  [67] "Streptococcus_anginosus"             
##  [68] "Streptococcus_australis"             
##  [69] "Streptococcus_gordonii"              
##  [70] "Streptococcus_infantis"              
##  [71] "Streptococcus_mitis"                 
##  [72] "Streptococcus_oralis"                
##  [73] "Streptococcus_parasanguinis"         
##  [74] "Streptococcus_peroris"               
##  [75] "Streptococcus_pneumoniae"            
##  [76] "Streptococcus_pseudopneumoniae"      
##  [77] "Streptococcus_sanguinis"             
##  [78] "Streptococcus_sp_A12"                
##  [79] "Streptococcus_sp_F0442"              
##  [80] "Streptococcus_sp_HMSC034E03"         
##  [81] "Streptococcus_sp_HMSC067H01"         
##  [82] "Streptococcus_sp_HPH0090"            
##  [83] "Streptococcus_sp_M334"               
##  [84] "Eubacterium_brachy"                  
##  [85] "Eubacterium_infirmum"                
##  [86] "Eubacterium_sulci"                   
##  [87] "Mogibacterium_diversum"              
##  [88] "Mogibacterium_pumilum"               
##  [89] "Mogibacterium_timidum"               
##  [90] "Lachnoanaerobaculum_saburreum"       
##  [91] "Oribacterium_parvum"                 
##  [92] "Oribacterium_sinus"                  
##  [93] "Oribacterium_sp_oral_taxon_078"      
##  [94] "Stomatobaculum_longum"               
##  [95] "Peptostreptococcus_stomatis"         
##  [96] "Bulleidia_extructa"                  
##  [97] "Solobacterium_moorei"                
##  [98] "Limnochorda_pilosa"                  
##  [99] "Veillonella_atypica"                 
## [100] "Veillonella_dispar"                  
## [101] "Veillonella_infantium"               
## [102] "Veillonella_sp_T11011_6"             
## [103] "Parvimonas_micra"                    
## [104] "Leptotrichia_wadei"                  
## [105] "Gemmata_obscuriglobus"               
## [106] "Paludisphaera_borealis"              
## [107] "Achromobacter_xylosoxidans"          
## [108] "Cupriavidus_sp"                      
## [109] "Sutterella_parvirubra"               
## [110] "Anaerobiospirillum_thomasii"         
## [111] "Cardiobacterium_hominis"             
## [112] "Cardiobacterium_valvarum"            
## [113] "Alkalilimnicola_ehrlichii"           
## [114] "Escherichia_coli"                    
## [115] "Thiohalorhabdus_denitrificans"       
## [116] "Pseudomonas_aeruginosa"              
## [117] "Acholeplasma_oculi"                  
## [118] "Candida_albicans"                    
## [119] "Candida_dubliniensis"                
## [120] "Candida_parapsilosis"                
## [121] "Nakamurella_silvestris"              
## [122] "Prevotella_jejuni"                   
## [123] "Prevotella_pallens"                  
## [124] "Prevotella_salivae"                  
## [125] "Prevotella_sp_oral_taxon_306"        
## [126] "Prevotella_veroralis"                
## [127] "Tannerella_sp_oral_taxon_HOT_286"    
## [128] "Capnocytophaga_gingivalis"           
## [129] "Listeria_floridensis"                
## [130] "Agitococcus_lubricus"                
## [131] "Lactobacillus_fermentum"             
## [132] "Lactobacillus_gasseri"               
## [133] "Sharpea_azabuensis"                  
## [134] "Oribacterium_asaccharolyticum"       
## [135] "Megasphaera_micronuciformis"         
## [136] "Neisseria_flavescens"                
## [137] "Neisseria_subflava"                  
## [138] "Stenotrophomonas_maltophilia"        
## [139] "Stenotrophomonas_pavanii"            
## [140] "Stenotrophomonas_rhizophila"         
## [141] "Eggerthia_catenaformis"              
## [142] "Haemophilus_parainfluenzae"          
## [143] "Candida_orthopsilosis"               
## [144] "Nannocystis_exedens"                 
## [145] "Campylobacter_concisus"
cat("Sputum taxa (q<0.1 & abs(.$coef) >1)")
## Sputum taxa (q<0.1 & abs(.$coef) >1)
filt_fit_data_spt %>% subset(., .$qval < 0.1 ) %>% .$metadata %>% table
## .
## benzonase host_zero     lypma   molysis    qiaamp 
##        86       101        82       102       111

Fig. S8. Volcano plot

Fig. S8. Volcano plot of differential abundance of microbes by each treatment with a model MaAsLin (relative abundance of each taxa ~ sample type + lyPMA + Benzonase + HostZERO + MolYsis + QIAamp, random effect = subject id).

#Making significance table for figure
        # Define a function to make species names italicized
# Make a significance table for each figure (top 20 taxa)
species_italic <- function(data) {
  names <- gsub("_", " ", rownames(data))
  names <- gsub("[]]|[[]", "", names)
  names <- gsub(" sp", " sp.", names)
  names <- gsub(" sp.", "* sp.", names)
  names <- gsub(" group", "", names)
  names <- ifelse(grepl("[*]", names), paste("*", names, sep = ""), paste("*", names, "*", sep = ""))
  rownames(data) <- names
  data
}
species_revise <- function(data) {
  data$feature <- gsub("Saccharomyces_cerevisiae_x_Saccharomyces_kudriavzevii", "S._cerevisiae x S._kudriavzevii", data$feature)
  data$feature <- gsub("Pseudomonas_aeruginosa_group", "Pseudomonas_aeruginosa", data$feature)
  data$feature <- gsub("Pseudomonas_fluorescens_group", "Pseudomonas_fluorescens", data$feature)
  data
}



make_sig_table <- function(data) {
  sig_data <- spread(data[order(data$qval), c("feature", "metadata", "qval")], metadata, qval)
  sig_data <- species_revise(sig_data)
  sig_data$min <- apply(sig_data %>% dplyr::select(c("lypma", "benzonase", "molysis", "host_zero", "qiaamp")), 1, FUN = min)
  sig_data <- sig_data[order(sig_data$min),] %>% dplyr::select("feature", "lypma", "benzonase", "host_zero", "molysis", "qiaamp") %>% .[1:20,]
  sig_data[["feature"]] <- ifelse(sig_data[["feature"]] == "X.Collinsella._massiliensis", "[Collinsella]_massiliensis", sig_data[["feature"]])
  sig_data_italic <- sig_data %>% rownames_to_column(var = "-") %>%
          column_to_rownames(var = "feature") %>% species_italic %>% dplyr::select(-c("-")) %>%
          rename(lyPMA = lypma,  Benzonase = benzonase, `HostZERO` = host_zero, MolYsis = molysis, QIAamp = qiaamp)
  sig_data_sig <- ifelse(sig_data_italic < 0.1, "*", NA) %>% data.frame(check.names = F)
  return(list(data = sig_data, data_italic = sig_data_italic, data_sig = sig_data_sig))
}



#filt_fit_data_pos <- make_sig_table(filt_fit_data_pos)
filt_fit_data_bal <- make_sig_table(filt_fit_data_bal)
filt_fit_data_ns <- make_sig_table(filt_fit_data_ns)
filt_fit_data_spt <- make_sig_table(filt_fit_data_spt)

#filt_pos_sig <- subset_taxa(subset_samples(phyloseq_unfiltered$phyloseq_rel, sample_type == "Mock"),
#                                       taxa_names(subset_samples(phyloseq_unfiltered$phyloseq_rel, sample_type == "Mock")) %in% filt_fit_data_pos$data$feature)

#filt_fit_data_pos$rel <- cbind(filt_pos_sig %>% otu_table %>% t, filt_pos_sig %>% sample_data) %>% group_by(treatment) %>% summarise_if(is.numeric, mean, na.rm = TRUE) %>% .[, 1:21] %>% column_to_rownames(., "treatment") %>% t () %>% species_italic() %>%  data.frame(check.names = F) %>% 
#        .[row.names(filt_fit_data_pos$data_italic),] %>%  mutate_all(~na_if(., 0)) %>% rownames_to_column("feature")

filt_spt_sig <- subset_taxa(subset_samples(phyloseq_unfiltered$phyloseq_rel, sample_type == "Sputum"), taxa_names(subset_samples(phyloseq_unfiltered$phyloseq_rel, sample_type == "Sputum")) %in% filt_fit_data_spt$data$feature)

filt_fit_data_spt$rel <- cbind(filt_spt_sig %>% otu_table %>% t, filt_spt_sig %>% sample_data) %>% group_by(treatment) %>% summarise_if(is.numeric, mean, na.rm = TRUE) %>% .[, 1:21] %>% column_to_rownames(., "treatment") %>% t () %>% species_italic() %>%  data.frame(check.names = F) %>% 
        .[row.names(filt_fit_data_spt$data_italic),] %>%  mutate_all(~na_if(., 0)) %>% rownames_to_column("feature")

filt_ns_sig <- subset_taxa(subset_samples(phyloseq_unfiltered$phyloseq_rel, sample_type == "Nasal"),
                                       taxa_names(subset_samples(phyloseq_unfiltered$phyloseq_rel, sample_type == "Nasal")) %in% filt_fit_data_ns$data$feature)

filt_fit_data_ns$rel <- cbind(filt_ns_sig %>% otu_table %>% t, filt_ns_sig %>% sample_data) %>% group_by(treatment) %>% summarise_if(is.numeric, mean, na.rm = TRUE) %>% .[, 1:21] %>% column_to_rownames(., "treatment") %>% t () %>% species_italic() %>%  data.frame(check.names = F) %>% 
        .[row.names(filt_fit_data_ns$data_italic),] %>%  mutate_all(~na_if(., 0)) %>% rownames_to_column("feature")

filt_bal_sig <- subset_taxa(subset_samples(phyloseq_unfiltered$phyloseq_rel, sample_type == "BAL"),
                                       taxa_names(subset_samples(phyloseq_unfiltered$phyloseq_rel, sample_type == "BAL")) %in% filt_fit_data_bal$data$feature)

filt_fit_data_bal$rel <- cbind(filt_bal_sig %>% otu_table %>% t, filt_bal_sig %>% sample_data) %>% group_by(treatment) %>% summarise_if(is.numeric, mean, na.rm = TRUE) %>% .[, 1:21] %>% column_to_rownames(., "treatment") %>% t () %>% species_italic() %>%  data.frame(check.names = F) %>%
        .[row.names(filt_fit_data_bal$data_italic),] %>%
        mutate_all(~na_if(., 0)) %>% rownames_to_column("feature") %>% subset(., !grepl("NA", .$feature))
#Volcano plot

figS7 <- ggplot(filt_maaslin_all, aes(y = -log10(qval), x = Estimate, col = metadata)) +
        theme_classic(base_family = "sans") +
        #labs(tag = "A") +
        geom_point(size = 2, alpha = 0.3, stroke = 0) +
        xlab("Change estimate of CLR-normalized read counts") +
        ylab("-log<sub>10</sub>(*q*-value)") +
        #ylim(c(-1, 35)) +
        geom_hline(yintercept = 1, col = "gray") +
        geom_vline(xintercept = 0, col = "gray") +
        annotate(family = "sans",
                 geom='richtext',
                 x=0, y=20,
                 label = "<i>q</i>-value = 0.1, fold-change = 0") +
        theme(legend.position = "top", axis.title.y = ggtext::element_markdown(), legend.text = element_markdown()) +
        scale_color_manual(values = c("#a65628",
                                      "grey",
                                      "#fb9a99",
                                      "#33a02c",
                                      "#b2df8a",
                                      "#1f78b4",
                                      "#a6cee3"),
                           breaks = c("log10.Final_reads",
                                      "sample_type",
                                      "lypma",
                                      "benzonase",
                                      "host_zero",
                                      "molysis",
                                      "qiaamp"), 
                           labels = c("log<sub>10</sub>(Final reads)",
                                      "Sample type",
                                      "lyPMA",
                                      "Benzonase",
                                      "HostZERO",
                                      "MolYsis",
                                      "QIAamp")) + #color using https://colorbrewer2.org/#type=qualitative&scheme=Set1&n=6
        guides(col = guide_legend(title = "Factors", title.position = "top", nrow = 1))

figS7

png(file = "Project_SICAS2_microbiome/7_Manuscripts/2022_MGK_Host_Depletion/Figures/FigureS8.png",   # The directory you want to save the file in
    width = 180, # The width of the plot in inches
    height = 90, # The height of the plot in inches/
    units = "mm",
    res = 600
) #fixing multiple page issue
figS7
dev.off()
## quartz_off_screen 
##                 2

Fig. 4. Balloon plot

Fig. 4. Mean relative abundance of top 20 significant taxa by q-value identified by differential abundance analysis using MaAsLin. Analyses were stratified by sample type. (A) bronchoalveolor lavage, (B) nasal swabs, and (C) sputum. Statistical significances were noted at the level of q-value < 0.1

#ffff33 qia

f5b <- merge(filt_fit_data_bal$rel %>%
              gather(treatment,
                     value,
                     Untreated:QIAamp,
                     factor_key=TRUE),
      filt_fit_data_bal$data_italic %>%
              rownames_to_column("feature") %>%
              gather(treatment,
                     qval,
                     lyPMA:QIAamp,
                     factor_key=TRUE),
      by.x = c('feature', 'treatment'),
      by.y = c('feature', 'treatment'),
      all = T) %>%
        
        merge(filt_fit_data_bal$data_sig %>%
              rownames_to_column("feature") %>%
              gather(treatment,
                     sig,
                     lyPMA:QIAamp,
                     factor_key=TRUE),
              by.x = c('feature', 'treatment'),
              by.y = c('feature', 'treatment'),
              all = T) %>%
        mutate(sig = case_when(is.na(qval) ~ "> 0.1",
                               qval < 0.1 ~ "< 0.1",
                               .default = "> 0.1")) %>%

#Baloon plot
        ggballoonplot(size = "value", y = "feature", x= "treatment", fill = "sig") +
        theme_classic(base_family = "sans") +
        #colors for qvalues
        #xlab("Experimental group") +
        #ylab("Species") +
        labs(tag = "A")  +
        ggtitle("BAL") +
        theme(panel.grid.major = element_line(colour = "grey"),
              legend.position = "top",
              #axis.text.x = element_text(angle = 45, vjust = 1, hjust=1),
              #Element markdown for taxa name italicizing
              axis.text.y = ggtext::element_markdown(size = 8),
              axis.title.y = element_blank(),
              axis.title.x = element_blank(),
              plot.margin = unit(c(0,0.2,0,1), 'lines'))  +
        scale_fill_manual(values = c("grey", "red"), aes(y = feature,
                      x = treatment,
                      label = sig)) +
        guides(fill = guide_legend(title = c(expression(paste(italic("q"),
                                                       "-value",
                                                       sep = ""))),
                                   title.position = "top"),
               size = guide_legend(title = "Relative abundance",
                                   title.position = "top",
                                   order = 1,
                                   nrow = 1)
               ) + 
        scale_x_discrete(labels=c("control" = "Untreated",
                                  "lypma" = "lyPMA",
                                  "benzonase" = "Benzonase",
                                  "host_zero" = "Host-zero",
                                  "molysis" = "MolYsis",
                                  "qiaamp" = "QIAamp")
                         )

f5c <- merge(filt_fit_data_ns$rel %>%
              gather(treatment,
                     value,
                     Untreated:QIAamp,
                     factor_key=TRUE),
      filt_fit_data_ns$data_italic %>%
              rownames_to_column("feature") %>%
              gather(treatment,
                     qval,
                     lyPMA:QIAamp,
                     factor_key=TRUE),
      by.x = c('feature', 'treatment'),
      by.y = c('feature', 'treatment'),
      all = T) %>%
        
        merge(filt_fit_data_ns$data_sig %>%
              rownames_to_column("feature") %>%
              gather(treatment,
                     sig,
                     lyPMA:QIAamp,
                     factor_key=TRUE),
              by.x = c('feature', 'treatment'),
              by.y = c('feature', 'treatment'),
              all = T) %>%
        mutate(sig = case_when(sig < 0.1 ~ "< 0.1",
                               .default = "> 0.1")) %>%
#Baloon plot
        ggballoonplot(size = "value", y = "feature", x= "treatment", fill = "sig") +
        
        theme_classic(base_family = "sans") +
        #colors for qvalues
        gradient_fill(c("#006d2c", "#edf8fb")) +
        xlab("Experimental group") +
        ylab("Species") +
        labs(tag = "B")  +
        ggtitle("Nasal swab") +
        theme(panel.grid.major = element_line(colour = "grey"),
              legend.position = "top",
              #axis.text.x = element_text(angle = 45, vjust = 1, hjust=1),
              #Element markdown for taxa name italicizing
              axis.text.y = ggtext::element_markdown(size = 8),
              axis.title.x = element_blank(),
              plot.margin = unit(c(0,0.2,0,1), 'lines'))  +
        #Adding significance asterisks
        scale_fill_manual(values = c("red", "grey"), aes(y = feature,
                      x = treatment,
                      label = sig)) +
        guides(fill = guide_legend(title = c(expression(paste(italic("q"),
                                                       "-value",
                                                       sep = ""))),
                                   title.position = "top",
                                   override.aes = list(size=5)),
               size = guide_legend(title = "Relative abundance",
                                   title.position = "top",
                                   order = 1,
                                   nrow = 1),
               ) + 
        scale_x_discrete(labels=c("control" = "Untreated",
                                  "lypma" = "lyPMA",
                                  "benzonase" = "Benzonase",
                                  "host_zero" = "Host-zero",
                                  "molysis" = "MolYsis",
                                  "qiaamp" = "QIAamp")
                         )

f5d <- merge(filt_fit_data_spt$rel %>%
              gather(treatment,
                     value,
                     Untreated:QIAamp,
                     factor_key=TRUE),
      filt_fit_data_spt$data_italic %>%
              rownames_to_column("feature") %>%
              gather(treatment,
                     qval,
                     lyPMA:QIAamp,
                     factor_key=TRUE),
      by.x = c('feature', 'treatment'),
      by.y = c('feature', 'treatment'),
      all = T) %>%
        merge(filt_fit_data_spt$data_sig %>%
              rownames_to_column("feature") %>%
              gather(treatment,
                     sig,
                     lyPMA:QIAamp,
                     factor_key=TRUE),
              by.x = c('feature', 'treatment'),
              by.y = c('feature', 'treatment'),
              all = T) %>%
        mutate(sig = case_when(sig < 0.1 ~ "< 0.1",
                               .default = "> 0.1")) %>%
#Baloon plot
        ggballoonplot(size = "value", y = "feature", x= "treatment", fill = "sig") +
        
        theme_classic(base_family = "sans") +
        #colors for qvalues
        gradient_fill(c("#006d2c", "#edf8fb")) +
        xlab("Experimental group") +
        #ylab("Species") +
        labs(tag = "C")  +
        ggtitle("Sputum") +
        theme(panel.grid.major = element_line(colour = "grey"),
              legend.position = "top",
              #axis.text.x = element_text(angle = 45, vjust = 1, hjust=1),
              #Element markdown for taxa name italicizing
              axis.text.y = ggtext::element_markdown(size = 8),
              #axis.title.x = element_blank(),
              axis.title.x = element_text(margin = margin(t = 0)),
              axis.title.y = element_blank(),
              plot.margin = unit(c(0,0.2,0,1), 'lines'))  +
        #Adding significance asterisks
        scale_fill_manual(values = c("red", "grey"), aes(y = feature,
                      x = treatment,
                      label = sig)) +
        guides(fill = guide_legend(title = c(expression(paste(italic("q"),
                                                       "-value",
                                                       sep = ""))),
                                   title.position = "top"),
               size = guide_legend(title = "Relative abundance",
                                   title.position = "top",
                                   order = 1,
                                   nrow = 1),
               ) + 
        scale_x_discrete(labels=c("control" = "Untreated",
                                  "lypma" = "lyPMA",
                                  "benzonase" = "Benzonase",
                                  "host_zero" = "Host-zero",
                                  "molysis" = "MolYsis",
                                  "qiaamp" = "QIAamp")
                         )

fig4 <- ggarrange(f5c %>% lemon::g_legend() %>% as_ggplot,
                  f5b,
                  f5c,
                  f5d,
                  #b, b %>% lemon::g_legend() %>% as_ggplot,
                  ncol=1, heights = c(1.5, 4, 4, 4),
                  legend = "none",
                  align = "hv")

fig4

# png(file = "Project_SICAS2_microbiome/7_Manuscripts/2022_MGK_Host_Depletion/Figures/Figure4.png",   # The directory you want to save the file in
#     width = 180, # The width of the plot in inches
#     height = 220, # The height of the plot in inches
#     units = "mm",
#     res = 600
# ) #fixing multiple page issue
# 
# fig4
# 
# 
# 
# 
# # alpha diversity plots
# #ggarrange(f4ad, ggarrange(f4e, f4f, ncol = 2),
# #          ncol = 1) # alpha diversity plots
# 
# dev.off()
# 



pdf(file = "Project_SICAS2_microbiome/7_Manuscripts/2022_MGK_Host_Depletion/Figures/Figure4.pdf",   # The directory you want to save the file in
    width = 183 / 25.4, # Convert width from mm to inches (180 mm)
    height = 220 / 25.4, # Convert height from mm to inches (160 mm)
    paper = "special",   # Prevents default paper size settings
    onefile = FALSE      # Ensures a single page per PDF file
)


fig4
# alpha diversity plots
#ggarrange(f4ad, ggarrange(f4e, f4f, ncol = 2),
#          ncol = 1) # alpha diversity plots

dev.off()
## quartz_off_screen 
##                 2

Table S8. List of DA taxa

Table S8. List of differentially abundant taxa identified by MaAsLin (Beghini et al., 2021). Significant associations (q-value < 0.1) that is showing high change ( | log2(fold-change) | > 0.5) were listed.

filt_maaslin_all %>% subset(., .$qval < 0.1 & abs(.$Estimate) > 4) 
tableS8 <- filt_maaslin_all %>% subset(., .$qval < 0.1 & abs(.$Estimate) > 4) %>% dplyr::select(c("feature", "metadata", "Estimate", "qval")) %>%
        mutate(feature = paste("<i>", gsub("_", " ", feature), sep = "") %>%
                       gsub(" sp", "</i> sp.", .) %>% gsub(" group", "", .)) %>%
        mutate(feature = case_when(!grepl("</i>", feature) ~ paste(feature, "</i>", sep = ""),
                                   .default = feature))%>% 
        rename(Taxa = "feature",
                     `Fixed effect` = "metadata",
                     "Change estimate of CLR transformed count" = "Estimate",
                     `<i>q<i/>-value` = "qval") %>% 
        mutate(`Fixed effect` = case_when(
                         `Fixed effect` == "sample_type" ~ "Sample type",
                         `Fixed effect` == "lypma" ~ "lyPMA",
                         `Fixed effect` == "host_zero" ~ "HostZERO",
                         `Fixed effect` == "benzonase" ~ "Benzonase",
                         `Fixed effect` == "molysis" ~ "MolYsis",
                         `Fixed effect` == "qiaamp" ~ "QIAamp",
                         .default = `Fixed effect`),
               "Change estimate of CLR transformed count" = round(`Change estimate of CLR transformed count`, 3),
               `<i>q<i/>-value` = round(`<i>q<i/>-value`, 3)) %>%
        remove_rownames() %>% 
        kbl(format = "html", escape = FALSE) %>%
        kable_styling(full_width = 0, html_font = "sans")

tableS8
Taxa Fixed effect Change estimate of CLR transformed count q-value
Actinomyces graevenitzii Sample type 6.348 0.000
Actinomyces odontolyticus Sample type 4.810 0.023
Actinomyces oris Sample type 6.431 0.000
Actinomyces sp. HMSC035G02 Sample type 4.653 0.000
Actinomyces sp. ICM47 Sample type 4.832 0.000
Corynebacterium accolens Sample type 13.471 0.000
Corynebacterium atypicum Sample type 4.963 0.000
Rothia dentocariosa Sample type 7.048 0.000
Rothia mucilaginosa Sample type 6.192 0.010
Cutibacterium acnes Sample type 12.358 0.000
Cutibacterium granulosum Sample type 6.082 0.005
Prevotella oris Sample type -4.478 0.032
Gemella sanguinis Sample type 6.933 0.000
Staphylococcus epidermidis Sample type 8.345 0.000
Dolosigranulum pigrum Sample type 7.626 0.030
Granulicatella elegans Sample type 4.678 0.000
Streptococcus australis Sample type 5.144 0.000
Streptococcus gordonii Sample type 4.994 0.000
Streptococcus infantis Sample type 6.457 0.000
Streptococcus mitis Sample type 4.856 0.034
Streptococcus parasanguinis Sample type 7.562 0.000
Streptococcus salivarius Sample type 7.307 0.000
Streptococcus sanguinis Sample type 5.021 0.001
Streptococcus sp. F0442 Sample type 4.818 0.000
Finegoldia magna Sample type -4.712 0.018
Cupriavidus sp. Sample type -7.565 0.000
Sutterella parvirubra lyPMA 4.478 0.000
Sutterella parvirubra HostZERO 4.048 0.000
Pseudomonas aeruginosa Sample type 7.500 0.000
Candida albicans Sample type -5.990 0.013
Candida dubliniensis Sample type -6.395 0.010
Malassezia restricta Sample type 4.803 0.043
Alloprevotella tannerae Sample type -5.883 0.008
Prevotella sp. oral taxon 306 Sample type -4.575 0.025
Prevotella veroralis Sample type -4.982 0.017
Lactobacillus fermentum Sample type -5.182 0.029
Staphylococcus haemolyticus Sample type -5.489 0.028
Candida orthopsilosis Sample type -4.995 0.026
save_kable(tableS8, file = "Project_SICAS2_microbiome/7_Manuscripts/2022_MGK_Host_Depletion/Figures/tableS8.html", self_contained = T)

3.4. Effect of treatments on functional analysis results

xi. Did depletion methods change diversity, by sample type?

Function diveristy

Fig. S9. Figure of alpha and beta

Fig. S9. Alpha and beta diversity by sample type and treatment method of predicted functions. (A) Species richness with statistical test results (linear mixed effect model stratified by sample type), (B) Morisita-Horn dissimilarity within subject between treatment, representing squares for median value and bars for 95% confidence intervals.

sample_data <- sample_data(phyloseq$phyloseq_path_rpk) %>% data.frame(check.names = F) %>% subset(., !is.nan(.$simpson))
phyloseq_rel_nz_f <- subset_samples(phyloseq$phyloseq_path_rpk, S.obs != 0 & sample_type %in% c("BAL", "Nasal", "Sputum", "Mock"))

sample_data(phyloseq_rel_nz_f)$log10.Final_reads <- log10(sample_data(phyloseq_rel_nz_f)$Final_reads)
sample_data(phyloseq_rel_nz_f)$sampletype_treatment <- paste(sample_data(phyloseq_rel_nz_f)$sample_type, sample_data(phyloseq_rel_nz_f)$treatment, sep = ":")
f4a <- ggplot(subset(sample_data(phyloseq$phyloseq_path_rpk) %>% 
                             data.frame,
                     sample_data(phyloseq$phyloseq_path_rpk)$sample_type %in% 
                             c("Sputum","Nasal", "BAL"#, "Mock"
                               )), aes(x = treatment, y = S.obs)) +
        geom_jitter(aes(color = treatment), position = position_jitter(0.2), size = 1.2,
                    alpha = 0.3, stroke = 0) +
        stat_summary(aes(color = treatment),
                             fun.data="mean_sdl",  fun.args = list(mult=1), 
                             geom = "pointrange",  size = 0.4) +
        ylab("Functional richness") +
        xlab("Treatment group") +
        theme_classic (base_size = 12, base_family = "sans") + 
        scale_color_manual(values = c("#e31a1c", "#fb9a99", "#33a02c", "#b2df8a", "#1f78b4", "#a6cee3"), name = "Treatment", labels = c("Untreated","lyPMA", "Benzonase", "HostZERO", "MolYsis", "QIAamp")) + #color using 
        labs(tag = "A") +
        theme(plot.tag = element_text(size = 15),  axis.text.x = element_blank(), axis.ticks.x = element_blank()) +
        facet_wrap(~sample_type, nrow = 1) + 
        guides(col = guide_legend(nrow = 1)) 

dat_text <- data.frame(
  label = c(
          #"**", "***", "***", "", "***", #label for Mock
          "", "*", "**", "***", "*", #label for BAL
          "", "", "***", "**", "***", 
          "*", "**", "***", "***", "***"),
  sample_type = c(
          #"Mock", "Mock", "Mock", "Mock", "Mock", 
          "BAL", "BAL", "BAL", "BAL", "BAL", 
          "Nasal", "Nasal", "Nasal", "Nasal", "Nasal", 
          "Sputum", "Sputum", "Sputum", "Sputum", "Sputum"),
  treatment     = c(
          #"lyPMA", "Benzonase", "HostZERO", "MolYsis", "QIAamp", 
          "lyPMA", "Benzonase", "HostZERO", "MolYsis", "QIAamp",
          "lyPMA", "Benzonase", "HostZERO", "MolYsis", "QIAamp",
          "lyPMA", "Benzonase", "HostZERO", "MolYsis", "QIAamp"),
  S.obs = c(
          #420, 400, 330, 410, 400,
          250, 230, 300, 320, 300,
          210, 220, 230, 240, 250,
          330, 350, 370, 350, 360)
)

dat_text$sample_type <- factor(dat_text$sample_type, levels = c(#"Mock", 
                                                                "BAL", "Nasal", "Sputum"))
dat_text$treatment <- factor(dat_text$treatment, levels = c("Untreated", "lyPMA", "Benzonase", "HostZERO", "MolYsis", "QIAamp"))



f4a <- f4a + geom_text(
  data    = dat_text,
  mapping = aes(x = treatment, y = S.obs, label = label)
)


#f3a <- f3a + geom_text(
#  data    = dat_text,
#  mapping = aes(x = treatment, y = S.obs, label = label)
#)


#distances of betadiversity - boxplots
horn_dist_long <- distance(phyloseq_rel_nz_f, method="horn") %>% as.matrix() %>% melt_dist() #making long data of distance matrices

#Adding sample type and treatment name. 
#this can be also done by merging metadata into the `horn_dist_long`
names <- data.frame(str_split_fixed(horn_dist_long$iso1, "_", 3))
names2 <- data.frame(str_split_fixed(horn_dist_long$iso2, "_", 3))
horn_dist_long$sample_id_1 <- paste(names$X1, names$X2, sep = "_")
horn_dist_long$method_1 <- ifelse(grepl("lyPMA", horn_dist_long$iso1),"lypma", 
                                         ifelse(grepl("ben", horn_dist_long$iso1),"benzonase", 
                                                ifelse(grepl("host", horn_dist_long$iso1),"host_zero", 
                                                       ifelse(grepl("qia", horn_dist_long$iso1),"qiaamp", 
                                                              ifelse(grepl("moly", horn_dist_long$iso1),"molysis", 
                                                                     "control")))))


#Adding data for iso 2 also should be done
horn_dist_long$sample_id_2 <- paste(names2$X1, names2$X2, sep = "_")
horn_dist_long$method_2 <-ifelse(grepl("lyPMA", horn_dist_long$iso2),"lypma", 
                                        ifelse(grepl("ben", horn_dist_long$iso2),"benzonase", 
                                               ifelse(grepl("host", horn_dist_long$iso2),"host_zero", 
                                                      ifelse(grepl("qia", horn_dist_long$iso2),"qiaamp", 
                                                             ifelse(grepl("moly", horn_dist_long$iso2),"molysis", 
                                                                    "control")))))


#subsetting distances of my interest
horn_dist_long$sample_id_1 <- ifelse(grepl("pos", horn_dist_long$sample_id_1, ignore.case = T),"Mock", 
                                 ifelse(grepl("neg|n_", horn_dist_long$sample_id_1, ignore.case = T),"Neg.",
                                        horn_dist_long$sample_id_1))
horn_dist_long$sample_id_2 <- ifelse(grepl("pos", horn_dist_long$sample_id_2, ignore.case = T),"Mock", 
                                 ifelse(grepl("neg|n_", horn_dist_long$sample_id_2, ignore.case = T),"Neg.",
                                        horn_dist_long$sample_id_2))


path_horn_dist_long_within_sampleid_from_control <- subset(horn_dist_long, horn_dist_long$sample_id_1 == horn_dist_long$sample_id_2) # data within samples

path_horn_dist_long_within_sampleid_from_control <- subset(path_horn_dist_long_within_sampleid_from_control,
                                                           path_horn_dist_long_within_sampleid_from_control$method_1 != path_horn_dist_long_within_sampleid_from_control$method_2) # remove irrelevant association

path_horn_dist_long_within_sampleid_from_control <- subset(path_horn_dist_long_within_sampleid_from_control, (path_horn_dist_long_within_sampleid_from_control$method_1 == "control") + (path_horn_dist_long_within_sampleid_from_control$method_2 == "control") != 0)


path_horn_dist_long_within_sampleid_from_control$treatment <- path_horn_dist_long_within_sampleid_from_control$method_1

path_horn_dist_long_within_sampleid_from_control$treatment <- ifelse(path_horn_dist_long_within_sampleid_from_control$treatment == "control", path_horn_dist_long_within_sampleid_from_control$method_2, path_horn_dist_long_within_sampleid_from_control$treatment) 


#Setting key method
path_horn_dist_long_within_sampleid_from_control$sample_type <- ifelse(grepl("NS", path_horn_dist_long_within_sampleid_from_control$iso1), "Nasal",
                                                                  ifelse(grepl("CFB", path_horn_dist_long_within_sampleid_from_control$iso1), "Sputum",
                                                                         ifelse(grepl("BAL", path_horn_dist_long_within_sampleid_from_control$iso1), "BAL",
                                                                                ifelse(grepl("pos|POS", path_horn_dist_long_within_sampleid_from_control$iso1, ignore.case = T), "Mock",
                                                                                       ifelse(grepl("neg|N_EXT", path_horn_dist_long_within_sampleid_from_control$iso1), "Neg.",NA)))))

#Making a column for baseline (controls, from where?)
path_horn_dist_long_within_sampleid_from_control <- path_horn_dist_long_within_sampleid_from_control %>% 
        mutate(dist_from = case_when(method_1 == "control" ~ iso1,
                                     method_2 == "control" ~ iso2))

dummy <- data.frame(iso1 = path_horn_dist_long_within_sampleid_from_control$dist_from %>% unique,
           iso2 = path_horn_dist_long_within_sampleid_from_control$dist_from %>% unique,
           dist = 0,
           treatment = "Untreated",
           method_1 = "control",
           method_2 = "control"
           )
names <- data.frame(str_split_fixed(dummy$iso1, "_", 3))
names2 <- data.frame(str_split_fixed(dummy$iso2, "_", 3))
dummy$sample_id_1 <- paste(names$X1, names$X2, sep = "_")
#Adding data for iso 2 also should be done
dummy$sample_id_2 <- paste(names2$X1, names2$X2, sep = "_")


#subsetting distances of my interest
dummy$sample_id_1 <- ifelse(grepl("pos", dummy$sample_id_1, ignore.case = T),"Mock", 
                                 ifelse(grepl("neg|n_", dummy$sample_id_1, ignore.case = T),"Neg.",
                                        dummy$sample_id_1))
dummy$sample_id_2 <- ifelse(grepl("pos", dummy$sample_id_2, ignore.case = T),"Mock", 
                                 ifelse(grepl("neg|n_", dummy$sample_id_2, ignore.case = T),"Neg.",
                                        dummy$sample_id_2))
dummy$sample_type <- ifelse(grepl("NS", dummy$iso1), "Nasal",
                            ifelse(grepl("CFB", dummy$iso1), "Sputum",
                                   ifelse(grepl("BAL", dummy$iso1), "BAL",
                                          ifelse(grepl("pos|POS", dummy$iso1, ignore.case = T), "Mock",
                                                 ifelse(grepl("neg|N_EXT", dummy$iso1), "Neg.",NA)))))
dummy <- subset(dummy, !is.na(dummy$sample_type))
path_horn_dist_long_within_sampleid_from_control <- bind_rows(path_horn_dist_long_within_sampleid_from_control, dummy)


path_horn_dist_long_within_sampleid_from_control$subject_id <- path_horn_dist_long_within_sampleid_from_control$sample_id_1

path_horn_dist_long_within_sampleid_from_control$treatment <-
        factor(path_horn_dist_long_within_sampleid_from_control$treatment,
               levels = c("Untreated", "lypma", "benzonase", "host_zero", "molysis", "qiaamp"))


f4b2 <- path_horn_dist_long_within_sampleid_from_control %>% 
        mutate(across(sample_type, factor, levels=c(#"Mock", 
                                                    "BAL", "Nasal","Sputum"
                                                    ))) %>%
        subset(., .$sample_type != "Neg.") %>% 
        subset(., .$treatment != "Untreated") %>%
  mutate(treatment = factor(treatment, levels = c("lypma", "benzonase", "host_zero", "molysis", "qiaamp"),
                            labels = c("lyPMA", "Benzonase", "HostZERO", "MolYsis", "QIAamp"))) %>%
        ggplot(aes(x = dist, y = treatment, col = treatment)) +
        geom_boxplot() + 
        facet_wrap(~sample_type, nrow = 4) +
        scale_y_discrete(limits=rev) +
        scale_color_manual(values = c(#"#e31a1c",
                                      "#fb9a99","#33a02c",
                                      "#b2df8a","#1f78b4","#a6cee3"),
                           name = "Treatment",
                           breaks = c(#"Untreated", 
                                      "lyPMA", "Benzonase",
                                      "HostZERO", "MolYsis", "QIAamp")) + #color using https://colorbrewer2.org/#type=qualitative&scheme=Set1&n=6
        xlab("Morisita-Horn dissimilarity from untreated") +
        ylab("Treatment group") +
        theme_classic (base_size = 12, base_family = "sans") + 
        theme(plot.tag = element_text(size = 15),
              axis.text.y = element_blank(),
              axis.ticks.y = element_blank(),
              legend.position = "none") +
        labs(tag = "B") +
        geom_vline(xintercept = 0, col = "black", linetype="dotted") +
        #coord_cartesian(xlim=c(-0.5, 1)) +
        #geom_text(aes(x = 0, label = treatment), hjust = 0, nudge_x = -.55, size = 3, color = "black", family = "sans") +
        #geom_text(aes(x = 0, label = text), hjust = 0, nudge_x = -0.4, size = 3, color = "black", family = "sans") +
        scale_x_continuous(breaks = c(-0.25, 0, 0.25, 0.5, 0.75),
                           labels = c(-0.25, "0 (low bias)", 0.25, 0.5, "0.75 (high bias)"))




figS8 <- ggarrange(f4a, f4b2, ncol = 1, common.legend = T, align = "hv") +
        guides(fill = guide_legend(nrow = 1))


figS8

png(file = "Project_SICAS2_microbiome/7_Manuscripts/2022_MGK_Host_Depletion/Figures/FigureS10.png",   # The directory you want to save the file in
    width = 180, # The width of the plot in inches
    height = 170, # The height of the plot in inches
    units = "mm",
    res = 600
) #fixing multiple page issue
figS8

# alpha diversity plots
#ggarrange(f4ad, ggarrange(f4e, f4f, ncol = 2),
#          ncol = 1) # alpha diversity plots

dev.off()
## quartz_off_screen 
##                 2

Functional richness - LMER all samples

Effect of some treatment was neutralized by interaction term. Therefore, the association was sample_type specific.

Effect size, standard error (SE) and p-value of a statistical test for functional richness with an interaction term using linear mixed effect model (Species richness ~ sample type * treatment + (1|subject_id) ).

raw result

library(lmerTest)
sample_data <- sample_data(phyloseq$phyloseq_path_rpk)
sample_data$log_centered_final_reads <- log(sample_data$Final_reads + 1) - median(log((subset(sample_data, sample_data$sample_type %in% c("BAL") & sample_data$treatment %in% c("Untreated")) %>% .$Final_reads) + 1))

lmer(S.obs ~ sample_type * treatment + (1|subject_id), data = sample_data %>% data.frame %>% subset(., .$sample_type %in% c("BAL", "Nasal", "Sputum"))) 
## Linear mixed model fit by REML ['lmerModLmerTest']
## Formula: S.obs ~ sample_type * treatment + (1 | subject_id)
##    Data: sample_data %>% data.frame %>% subset(., .$sample_type %in% c("BAL",  
##     "Nasal", "Sputum"))
## REML criterion at convergence: 855.5247
## Random effects:
##  Groups     Name        Std.Dev.
##  subject_id (Intercept) 35.90   
##  Residual               44.53   
## Number of obs: 95, groups:  subject_id, 20
## Fixed Effects:
##                          (Intercept)                      sample_typeNasal  
##                                23.60                                126.60  
##                    sample_typeSputum                        treatmentlyPMA  
##                               158.00                                 63.00  
##                   treatmentBenzonase                     treatmentHostZERO  
##                               137.20                                177.80  
##                     treatmentMolYsis                       treatmentQIAamp  
##                               203.20                                139.40  
##      sample_typeNasal:treatmentlyPMA      sample_typeSputum:treatmentlyPMA  
##                               -46.90                                 23.00  
##  sample_typeNasal:treatmentBenzonase  sample_typeSputum:treatmentBenzonase  
##                              -120.88                                -45.60  
##   sample_typeNasal:treatmentHostZERO   sample_typeSputum:treatmentHostZERO  
##                              -115.48                                -32.00  
##    sample_typeNasal:treatmentMolYsis    sample_typeSputum:treatmentMolYsis  
##                              -155.30                                -52.80  
##     sample_typeNasal:treatmentQIAamp     sample_typeSputum:treatmentQIAamp  
##                               -62.92                                -15.80

Tidy table

lmer(S.obs ~ sample_type * treatment + (1|subject_id), data = sample_data %>% data.frame %>% subset(., .$sample_type %in% c("BAL", "Nasal", "Sputum"))) %>% 
        summary() %>%
        .$coefficients %>%
        data.frame(check.names = F) %>% 
        mutate(` ` = case_when(abs(`Pr(>|t|)`) < 0.001 ~ "***",
                               abs(`Pr(>|t|)`) < 0.01 ~ "**",
                               abs(`Pr(>|t|)`) < 0.05 ~ "*",
                               .default = " ")) %>%
        rownames_to_column(var = "x") %>% mutate(x = gsub("sample_type|treatment", "", x))  %>% mutate(x = gsub(":", " * ", x)) %>% 
        column_to_rownames(var = "x") %>%  
        rename("<i>p</i>-value" = "Pr(>|t|)",
               t_value = "t value") %>%
        mutate("Effect size (95% CI)" = paste(round(Estimate, 1) %>% format(nsmall = 1), 
                                " (", 
                                round(Estimate - 1.96 * abs(t_value), 1) %>% format(nsmall = 1),
                                ", ",
                                round(Estimate + 1.96 * abs(t_value), 1) %>% format(nsmall = 1),
                                ")", 
                                sep = ""),
               "<i>p</i>-value" = round(`<i>p</i>-value`, 3)) %>%
        dplyr::select(c("Effect size (95% CI)", "<i>p</i>-value", " ")) %>% 
        kbl(format = "html", escape = 0) %>% kable_styling(full_width = 0, html_font = "sans")
Effect size (95% CI) p-value
(Intercept) 23.6 ( 21.8, 25.4) 0.361
Nasal 126.6 ( 118.7, 134.5) 0.000 ***
Sputum 158.0 ( 149.4, 166.6) 0.000 ***
lyPMA 63.0 ( 58.6, 67.4) 0.029
Benzonase 137.2 ( 127.7, 146.7) 0.000 ***
HostZERO 177.8 ( 165.4, 190.2) 0.000 ***
MolYsis 203.2 ( 189.1, 217.3) 0.000 ***
QIAamp 139.4 ( 129.7, 149.1) 0.000 ***
Nasal * lyPMA -46.9 ( -49.3, -44.5) 0.222
Sputum * lyPMA 23.0 ( 21.9, 24.1) 0.566
Nasal * Benzonase -120.9 (-127.1, -114.7) 0.002 **
Sputum * Benzonase -45.6 ( -47.8, -43.4) 0.257
Nasal * HostZERO -115.5 (-121.4, -109.5) 0.003 **
Sputum * HostZERO -32.0 ( -33.6, -30.4) 0.425
Nasal * MolYsis -155.3 (-163.3, -147.3) 0.000 ***
Sputum * MolYsis -52.8 ( -55.4, -50.2) 0.190
Nasal * QIAamp -62.9 ( -66.2, -59.7) 0.103
Sputum * QIAamp -15.8 ( -16.6, -15.0) 0.693

Table S9. Functional richness - all & stratified

Table S9. Effect size, standard error (SE) and p-value of a statistical test for functional richness with an interaction term using linear mixed effect model (functional richness ~ treatment + (1|subject_id)). Stratified analyses were conducted for each sample type as an interaction term of sample type and treatment was significant (p-value < 0.001) at ANOVA of LMER(functional richness ~ sample type + treatment + sample type * treatment + (1|subject_id)). The baseline of categorical variables is untreated group. Statistical significances were noted with : p-value < 0.01 and *: p-value < 0.001.

Association was not adjusted with sequencing depth.

Raw result - Mock

pfr_lmer_mock <- lm(S.obs ~ treatment, data = sample_data %>% data.frame %>% subset(., .$sample_type %in% c("Mock"))) 
pfr_lmer_mock %>% summary
## 
## Call:
## lm(formula = S.obs ~ treatment, data = sample_data %>% data.frame %>% 
##     subset(., .$sample_type %in% c("Mock")))
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -59.600  -4.167  -1.000   3.667  61.600 
## 
## Coefficients:
##                    Estimate Std. Error t value Pr(>|t|)    
## (Intercept)         434.333      9.269  46.858  < 2e-16 ***
## treatmentlyPMA      -42.733     13.748  -3.108  0.00465 ** 
## treatmentBenzonase  -86.933     13.748  -6.323 1.28e-06 ***
## treatmentHostZERO  -100.733     13.748  -7.327 1.12e-07 ***
## treatmentMolYsis    -14.133     13.748  -1.028  0.31380    
## treatmentQIAamp     -40.333     13.748  -2.934  0.00708 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 22.7 on 25 degrees of freedom
## Multiple R-squared:  0.7624, Adjusted R-squared:  0.7148 
## F-statistic: 16.04 on 5 and 25 DF,  p-value: 4.177e-07

Raw result - BAL

dummy1 <- ((phyloseq$phyloseq_path_rpk %>% otu_table %>% data.frame(check.names = F)) != 0) %>% colSums()


pfr_lmer_bal <- lm(S.obs ~ treatment, data = sample_data %>% data.frame %>% subset(., .$sample_type %in% c("BAL"))) 

pfr_lmer_bal %>% summary
## 
## Call:
## lm(formula = S.obs ~ treatment, data = sample_data %>% data.frame %>% 
##     subset(., .$sample_type %in% c("BAL")))
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -142.0  -49.3    0.2   60.3  158.4 
## 
## Coefficients:
##                    Estimate Std. Error t value Pr(>|t|)    
## (Intercept)           23.60      38.25   0.617 0.543098    
## treatmentlyPMA        63.00      54.10   1.165 0.255666    
## treatmentBenzonase   137.20      54.10   2.536 0.018135 *  
## treatmentHostZERO    177.80      54.10   3.286 0.003113 ** 
## treatmentMolYsis     203.20      54.10   3.756 0.000974 ***
## treatmentQIAamp      139.40      54.10   2.577 0.016553 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 85.54 on 24 degrees of freedom
## Multiple R-squared:  0.4487, Adjusted R-squared:  0.3338 
## F-statistic: 3.906 on 5 and 24 DF,  p-value: 0.009864

Raw result - BAL

pfr_lmer_ns <- lm(S.obs ~ treatment, data = sample_data %>% data.frame %>% subset(., .$sample_type %in% c("Nasal"))) 
pfr_lmer_ns %>% summary
## 
## Call:
## lm(formula = S.obs ~ treatment, data = sample_data %>% data.frame %>% 
##     subset(., .$sample_type %in% c("Nasal")))
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
##  -57.6  -17.6    3.4   20.7   75.8 
## 
## Coefficients:
##                    Estimate Std. Error t value Pr(>|t|)    
## (Intercept)         150.200      9.751  15.403 1.69e-15 ***
## treatmentlyPMA        7.800     16.890   0.462 0.647663    
## treatmentBenzonase   23.400     16.890   1.385 0.176484    
## treatmentHostZERO    69.400     16.890   4.109 0.000297 ***
## treatmentMolYsis     56.200     16.890   3.327 0.002391 ** 
## treatmentQIAamp      69.400     16.890   4.109 0.000297 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 30.84 on 29 degrees of freedom
## Multiple R-squared:  0.5248, Adjusted R-squared:  0.4428 
## F-statistic: 6.404 on 5 and 29 DF,  p-value: 0.0004024

Raw result - BAL

pfr_lmer_spt <- lm(S.obs ~ treatment, data = sample_data %>% data.frame %>% subset(., .$sample_type %in% c("Sputum"))) 
pfr_lmer_spt %>% summary
## 
## Call:
## lm(formula = S.obs ~ treatment, data = sample_data %>% data.frame %>% 
##     subset(., .$sample_type %in% c("Sputum")))
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -145.20  -18.00   -5.00   14.25  116.40 
## 
## Coefficients:
##                    Estimate Std. Error t value Pr(>|t|)    
## (Intercept)          181.60      22.76   7.978 3.32e-08 ***
## treatmentlyPMA        86.00      32.19   2.671 0.013353 *  
## treatmentBenzonase    91.60      32.19   2.845 0.008934 ** 
## treatmentHostZERO    145.80      32.19   4.529 0.000138 ***
## treatmentMolYsis     150.40      32.19   4.672 9.57e-05 ***
## treatmentQIAamp      123.60      32.19   3.839 0.000790 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 50.9 on 24 degrees of freedom
## Multiple R-squared:  0.5542, Adjusted R-squared:  0.4613 
## F-statistic: 5.967 on 5 and 24 DF,  p-value: 0.00101

Tidy table

pfr_lmer_mock_kbl <- pfr_lmer_mock %>%
        summary() %>%
        .$coefficients %>%
        data.frame(check.names = F) %>% 
        mutate(` ` = case_when(abs(`Pr(>|t|)`) < 0.001 ~ "***",
                               abs(`Pr(>|t|)`) < 0.01 ~ "**",
                               abs(`Pr(>|t|)`) < 0.05 ~ "*",
                               .default = " ")) %>% 
        rename("<i>p</i>-value" = "Pr(>|t|)",
               t_value = "t value") %>%
        merge(., 
              pfr_lmer_mock %>% confint() %>%
                        round(digits = 1) %>% 
                        format(nsmall = 1) %>%
                        as.data.frame() %>%
                        mutate(conf = paste("(",
                                            gsub(" ","", .[,1]),
                                            ", ",
                                            gsub(" ","", .[,2]),
                                            ")",
                                            sep = "")),
              by = 0
              ) %>%
        column_to_rownames("Row.names") %>%
        rownames_to_column(var = "x") %>% mutate(x = gsub("treatment|sample_type", "", x)) %>% mutate(x = gsub(":", " * ", x)) %>%
        mutate(x = gsub("bal_log_centered_final_reads", "log<sub>10</sub>(Final reads)", x)) %>%
        column_to_rownames(var = "x") %>% 
        mutate("Effect size (95% CI)" = 
                       paste(round(Estimate, digits = 1) %>%
                                     format(nsmall = 1),
                             conf, 
                             sep = " "),
               "<i>p</i>-value" = round(`<i>p</i>-value`, 3)) %>%
        dplyr::select(c("Effect size (95% CI)", "<i>p</i>-value", " ")) %>%
        .[c("(Intercept)",
                  "lyPMA",
                  "Benzonase",
                  "HostZERO",
                  "MolYsis",
                  "QIAamp"),]
        

pfr_lmer_bal_kbl <- pfr_lmer_bal %>% 
        summary() %>%
        .$coefficients %>%
        data.frame(check.names = F) %>% 
        mutate(` ` = case_when(abs(`Pr(>|t|)`) < 0.001 ~ "***",
                               abs(`Pr(>|t|)`) < 0.01 ~ "**",
                               abs(`Pr(>|t|)`) < 0.05 ~ "*",
                               .default = " ")) %>% 
        rename("<i>p</i>-value" = "Pr(>|t|)",
               t_value = "t value") %>%
        merge(., 
              pfr_lmer_bal %>% confint() %>%
                        round(digits = 1) %>% 
                        format(nsmall = 1) %>%
                        as.data.frame() %>%
                        mutate(conf = paste("(",
                                            gsub(" ","", .[,1]),
                                            ", ",
                                            gsub(" ","", .[,2]),
                                            ")",
                                            sep = "")),
              by = 0
              ) %>%
        column_to_rownames("Row.names") %>%
        rownames_to_column(var = "x") %>% mutate(x = gsub("treatment|sample_type", "", x)) %>% mutate(x = gsub(":", " * ", x)) %>%
        mutate(x = gsub("bal_log_centered_final_reads", "log<sub>10</sub>(Final reads)", x)) %>%
        column_to_rownames(var = "x") %>% 
        mutate("Effect size (95% CI)" = 
                       paste(round(Estimate, digits = 1) %>%
                                     format(nsmall = 1),
                             conf, 
                             sep = " "),
               "<i>p</i>-value" = round(`<i>p</i>-value`, 3)) %>%
        dplyr::select(c("Effect size (95% CI)", "<i>p</i>-value", " ")) %>%
        .[c("(Intercept)",
                  "lyPMA",
                  "Benzonase",
                  "HostZERO",
                  "MolYsis",
                  "QIAamp"),]
        
pfr_lmer_ns_kbl <- pfr_lmer_ns %>%
        summary() %>%
        .$coefficients %>%
        data.frame(check.names = F) %>% 
        mutate(` ` = case_when(abs(`Pr(>|t|)`) < 0.001 ~ "***",
                               abs(`Pr(>|t|)`) < 0.01 ~ "**",
                               abs(`Pr(>|t|)`) < 0.05 ~ "*",
                               .default = " ")) %>% 
        rename("<i>p</i>-value" = "Pr(>|t|)",
               t_value = "t value") %>%
        merge(., 
              pfr_lmer_ns %>% confint() %>%
                        round(digits = 1) %>% 
                        format(nsmall = 1) %>%
                        as.data.frame() %>%
                        mutate(conf = paste("(",
                                            gsub(" ","", .[,1]),
                                            ", ",
                                            gsub(" ","", .[,2]),
                                            ")",
                                            sep = "")),
              by = 0
              ) %>%
        column_to_rownames("Row.names") %>%
        rownames_to_column(var = "x") %>% mutate(x = gsub("treatment|sample_type", "", x)) %>% mutate(x = gsub(":", " * ", x)) %>%
        mutate(x = gsub("bal_log_centered_final_reads", "log<sub>10</sub>(Final reads)", x)) %>%
        column_to_rownames(var = "x") %>% 
        mutate("Effect size (95% CI)" = 
                       paste(round(Estimate, digits = 1) %>%
                                     format(nsmall = 1),
                             conf, 
                             sep = " "),
               "<i>p</i>-value" = round(`<i>p</i>-value`, 3)) %>%
        dplyr::select(c("Effect size (95% CI)", "<i>p</i>-value", " ")) %>%
        .[c("(Intercept)",
                  "lyPMA",
                  "Benzonase",
                  "HostZERO",
                  "MolYsis",
                  "QIAamp"),]
        
pfr_lmer_spt_kbl <-  pfr_lmer_spt %>%
        summary() %>%
        .$coefficients %>%
        data.frame(check.names = F) %>% 
        mutate(` ` = case_when(abs(`Pr(>|t|)`) < 0.001 ~ "***",
                               abs(`Pr(>|t|)`) < 0.01 ~ "**",
                               abs(`Pr(>|t|)`) < 0.05 ~ "*",
                               .default = " ")) %>% 
        rename("<i>p</i>-value" = "Pr(>|t|)",
               t_value = "t value") %>%
        merge(., 
              pfr_lmer_spt %>% confint() %>%
                        round(digits = 1) %>% 
                        format(nsmall = 1) %>%
                        as.data.frame() %>%
                        mutate(conf = paste("(",
                                            gsub(" ","", .[,1]),
                                            ", ",
                                            gsub(" ","", .[,2]),
                                            ")",
                                            sep = "")),
              by = 0
              ) %>%
        column_to_rownames("Row.names") %>%
        rownames_to_column(var = "x") %>% mutate(x = gsub("treatment|sample_type", "", x)) %>% mutate(x = gsub(":", " * ", x)) %>%
        mutate(x = gsub("bal_log_centered_final_reads", "log<sub>10</sub>(Final reads)", x)) %>%
        column_to_rownames(var = "x") %>% 
        mutate("Effect size (95% CI)" = 
                       paste(round(Estimate, digits = 1) %>%
                                     format(nsmall = 1),
                             conf, 
                             sep = " "),
               "<i>p</i>-value" = round(`<i>p</i>-value`, 3)) %>%
        dplyr::select(c("Effect size (95% CI)", "<i>p</i>-value", " ")) %>%
        .[c("(Intercept)",
                  "lyPMA",
                  "Benzonase",
                  "HostZERO",
                  "MolYsis",
                  "QIAamp"),]


tables9 <- cbind(pfr_lmer_bal_kbl, pfr_lmer_ns_kbl, pfr_lmer_spt_kbl) %>%
    kbl(format = "html", escape = 0) %>%
        add_header_above(c(" " = 1, "BAL" = 3, "Nasal swab" = 3,"Sputum"= 3)) %>% 
        kable_styling(full_width = 0, html_font = "sans")

tables9
BAL
Nasal swab
Sputum
Effect size (95% CI) p-value Effect size (95% CI) p-value Effect size (95% CI) p-value
(Intercept) 23.6 (-55.4, 102.6) 0.543 150.2 (130.3, 170.1) 0.000 *** 181.6 (134.6, 228.6) 0.000 ***
lyPMA 63.0 (-48.7, 174.7) 0.256 7.8 (-26.7, 42.3) 0.648 86.0 (19.6, 152.4) 0.013
Benzonase 137.2 (25.5, 248.9) 0.018
23.4 (-11.1, 57.9) 0.176 91.6 (25.2, 158.0) 0.009 **
HostZERO 177.8 (66.1, 289.5) 0.003 ** 69.4 (34.9, 103.9) 0.000 *** 145.8 (79.4, 212.2) 0.000 ***
MolYsis 203.2 (91.5, 314.9) 0.001 *** 56.2 (21.7, 90.7) 0.002 ** 150.4 (84.0, 216.8) 0.000 ***
QIAamp 139.4 (27.7, 251.1) 0.017
69.4 (34.9, 103.9) 0.000 *** 123.6 (57.2, 190.0) 0.001 ***
save_kable(tables9, file = "Project_SICAS2_microbiome/7_Manuscripts/2022_MGK_Host_Depletion/Figures/tableS9.html", self_contained = T)

Function beta - all samples

Rwa result, with interaction term

phyloseq_rel_nz_f <- transform_sample_counts(phyloseq$phyloseq_path_rpk, function(x) {x/sum(x)}) %>%
        subset_samples(S.obs != 0 & sample_type %in% c("Mock", "BAL", "Nasal", "Sputum"))


set.seed(seed)
horn_perm_inter_fun <- vegan::adonis2(by = "terms", distance(phyloseq_rel_nz_f, method="horn") ~ sample_type * treatment + subject_id,
                                  data = phyloseq_rel_nz_f %>% sample_data %>% data.frame(check.names = F), 
                                  strata = phyloseq_rel_nz_f %>% sample_data %>% data.frame(check.names = F) %>% .$subject_id,
                                  permutations = 10000)
set.seed(seed)
horn_perm_ns_fun <- vegan::adonis2(by = "terms", distance(subset_samples(phyloseq_rel_nz_f, sample_type == "Nasal"), method="horn") ~ lypma + benzonase + host_zero + molysis + qiaamp,
                               data = subset_samples(phyloseq_rel_nz_f, sample_type == "Nasal") %>%
                                       sample_data %>% data.frame(check.names = F),
                               strata = subset_samples(phyloseq_rel_nz_f, sample_type == "Nasal") %>% 
                                       sample_data %>% data.frame(check.names = F) %>% .$subject_id, permutations = 10000)
set.seed(seed)
horn_perm_bal_fun  <- vegan::adonis2(by = "terms", distance(subset_samples(phyloseq_rel_nz_f, sample_type == "BAL"), method="horn") ~  lypma + benzonase + host_zero + molysis + qiaamp,
                                 data = subset_samples(phyloseq_rel_nz_f, sample_type == "BAL") %>% sample_data %>% data.frame(check.names = F),
                                 strata = subset_samples(phyloseq_rel_nz_f, sample_type == "BAL") %>%
                                         sample_data %>% data.frame(check.names = F) %>% .$subject_id,
                                  permutations = 10000)
set.seed(seed)
horn_perm_spt_fun <- vegan::adonis2(by = "terms", distance(subset_samples(phyloseq_rel_nz_f, sample_type == "Sputum"), method="horn") ~ lypma + benzonase + host_zero + molysis + qiaamp,
                                data = subset_samples(phyloseq_rel_nz_f, sample_type == "Sputum") %>% sample_data %>% data.frame(check.names = F),
                                strata = subset_samples(phyloseq_rel_nz_f, sample_type == "Sputum")
                                %>% sample_data %>% data.frame(check.names = F) %>% .$subject_id,
                                  permutations = 10000)


horn_perm_inter_fun

Tidy table of PERMANOVA result

horn_perm_inter_fun %>% data.frame(check.names = F) %>% rownames_to_column("row.names") %>% 
        mutate(row.names = case_when(row.names == "sample_type" ~ 'Sample type',
                                     row.names == "treatment" ~ 'Treatment',
                                     row.names == "subject_id" ~ 'Subject',
                                     row.names == "log10(Final_reads)" ~ 'log10(Final reads)',
                                     row.names == "sample_type:treatment" ~ 'Sample type * Treatment',
                                     row.names == "Residual" ~ 'Residual',
                                     row.names == "Total" ~ 'Total')) %>% column_to_rownames('row.names') %>% 
        mutate(` ` = case_when(abs(`Pr(>F)`) < 0.001 ~ "***",
                                            abs(`Pr(>F)`) < 0.01 ~ "**",
                                            abs(`Pr(>F)`) < 0.05 ~ "*",
                               .default = " ")) %>% 
        mutate(across(is.numeric, round, digits=3),
               `Pr(>F)` = format(`Pr(>F)`, nsmall = 3)) %>% 
        rename("<i>p</i>-value" = "Pr(>F)",
               "R<sup>2</sup>" = "R2",
               "Degree of freedom" = "Df") %>% 
        dplyr::select(c("Degree of freedom", "R<sup>2</sup>", "<i>p</i>-value", " ")) %>% 
        kbl(format = "html", escape = 0) %>%
        kable_styling(full_width = 0, html_font = "sans")
Degree of freedom R2 p-value
Sample type 3 0.342 0.000 ***
Treatment 5 0.065 0.000 ***
Subject 17 0.229 0.000 ***
Sample type * Treatment 15 0.168 0.000 ***
Residual 82 0.196 NA
Total 122 1.000 NA

Function beta - stratified

Not included in the main text

Table. Degree of freedom, effect size (residual, R^2) and p-value of permutational ANOVA for functional Horn-Morisita distances with an interaction term and strata term (BC-distance of functions ~ sample type * treatment + log10(final reads), strata = subject id).

Raw result - BAL

horn_perm_bal_fun

Raw result - Nasal

horn_perm_ns_fun

Raw result - Sputum

horn_perm_spt_fun

Tidy table

a <- horn_perm_bal_fun %>% data.frame(check.names = F) %>% rownames_to_column('row.names') %>% 
        mutate(row.names = case_when(row.names == "lypma" ~ 'lyPMA',
                                     row.names == "benzonase" ~ 'Benzonase',
                                     row.names == "host_zero" ~ 'HostZERO',
                                     row.names == "molysis" ~ 'MolYsis',
                                     row.names == "qiaamp" ~ 'QIAamp',
                                     row.names == "subject_id" ~ 'Subject id',
                                     row.names == "log10(Final_reads)" ~ 'log10(Final reads)',
                                     row.names == "Residual" ~ 'Residual',
                                     row.names == "Total" ~ 'Total')) %>% column_to_rownames('row.names') %>% 
                mutate(` ` = case_when(abs(`Pr(>F)`) < 0.001 ~ "***",
                                            abs(`Pr(>F)`) < 0.01 ~ "**",
                                            abs(`Pr(>F)`) < 0.05 ~ "*",
                               .default = " ")) %>% 
        mutate(across(is.numeric, round, digits=3)) %>% 
        rename("<i>p</i>-value" = "Pr(>F)",
               "R<sup>2</sup>" = "R2",
               "Degree of freedom" = "Df") %>% 
        dplyr::select(c("R<sup>2</sup>", "<i>p</i>-value", " ")) 

b <- horn_perm_ns_fun %>% data.frame(check.names = F) %>% rownames_to_column('row.names') %>% 
        mutate(row.names = case_when(row.names == "lypma" ~ 'lyPMA',
                                     row.names == "benzonase" ~ 'Benzonase',
                                     row.names == "host_zero" ~ 'HostZERO',
                                     row.names == "molysis" ~ 'MolYsis',
                                     row.names == "qiaamp" ~ 'QIAamp',
                                     row.names == "subject_id" ~ 'Subject id',
                                     row.names == "log10(Final_reads)" ~ 'log10(Final reads)',
                                     row.names == "Residual" ~ 'Residual',
                                     row.names == "Total" ~ 'Total')) %>% column_to_rownames('row.names') %>% 
                mutate(` ` = case_when(abs(`Pr(>F)`) < 0.001 ~ "***",
                                            abs(`Pr(>F)`) < 0.01 ~ "**",
                                            abs(`Pr(>F)`) < 0.05 ~ "*",
                               .default = " ")) %>% 
        mutate(across(is.numeric, round, digits=3)) %>% 
        rename("<i>p</i>-value" = "Pr(>F)",
               "R<sup>2</sup>" = "R2",
               "Degree of freedom" = "Df") %>% 
        dplyr::select(c("R<sup>2</sup>", "<i>p</i>-value", " ")) 

c <- horn_perm_spt_fun %>% data.frame(check.names = F) %>% rownames_to_column('row.names') %>% 
        mutate(row.names = case_when(row.names == "lypma" ~ 'lyPMA',
                                     row.names == "benzonase" ~ 'Benzonase',
                                     row.names == "host_zero" ~ 'HostZERO',
                                     row.names == "molysis" ~ 'MolYsis',
                                     row.names == "qiaamp" ~ 'QIAamp',
                                     row.names == "subject_id" ~ 'Subject id',
                                     row.names == "log10(Final_reads)" ~ 'log10(Final reads)',
                                     row.names == "Residual" ~ 'Residual',
                                     row.names == "Total" ~ 'Total')) %>% column_to_rownames('row.names') %>% 
                mutate(` ` = case_when(abs(`Pr(>F)`) < 0.001 ~ "***",
                                            abs(`Pr(>F)`) < 0.01 ~ "**",
                                            abs(`Pr(>F)`) < 0.05 ~ "*",
                               .default = " ")) %>% 
        mutate(across(is.numeric, round, digits=3)) %>% 
        rename("<i>p</i>-value" = "Pr(>F)",
               "R<sup>2</sup>" = "R2",
               "Degree of freedom" = "Df") %>% 
        dplyr::select(c("R<sup>2</sup>", "<i>p</i>-value", " ")) 


tables9_A <- cbind(a, b, c) %>% 
        kbl(format = "html", escape = 0) %>%
        add_header_above(c(" " = 1, "BAL" = 3, "Nasal swab" = 3,"Sputum"= 3)) %>% 
        kable_styling(full_width = 0, html_font = "sans")


tables9_A
BAL
Nasal swab
Sputum
R2 p-value R2 p-value R2 p-value
lyPMA 0.047 0.102 0.091 0.018
0.023 0.207
Benzonase 0.014 0.725 0.018 0.289 -0.021 1.000
HostZERO 0.036 0.204 0.014 0.646 0.065 0.028
MolYsis 0.071 0.050 0.033 0.030
0.128 0.004 **
QIAamp 0.119 0.017
0.047 0.134 0.356 0.000 ***
Residual 0.713 NA 0.797 NA 0.449 NA
Total 1.000 NA 1.000 NA 1.000 NA

xii. What type of fuction were affected by the treatment?

Function DA analysis

#DA analysis - MaAslin
#Running MaAslin for all sample without decontam
#for taxa differentially abundant by host depletion method, look to see which ones overlap with potential contaminant taxa

# Maaslin - # # y ~ log(final reads) + sample_type + treatment  -----------

#all samples
f_maaslin_all <- read.csv("Project_SICAS2_microbiome/5_Scripts/MGK/Host_depletion_git/data/da_lmer_filt_f_maaslin_all.csv")
f_fit_data_bal <- read.csv("Project_SICAS2_microbiome/5_Scripts/MGK/Host_depletion_git/data/da_lmer_filt_f_fit_data_bal.csv")
f_fit_data_spt <- read.csv("Project_SICAS2_microbiome/5_Scripts/MGK/Host_depletion_git/data/da_lmer_filt_f_fit_data_spt.csv")
f_fit_data_ns <- read.csv("Project_SICAS2_microbiome/5_Scripts/MGK/Host_depletion_git/data/da_lmer_filt_f_fit_data_ns.csv")
#f_fit_data_pos <- read.csv("Project_SICAS2_microbiome/5_Scripts/MGK/Host_depletion_git/data/da_lmer_filt_f_fit_data_pos.csv")

Again, most of DA functions were sample type specific

#Making significance table for figure
        # Define a function to make species names italicized
# Make a significance table for each figure (top 20 taxa)
make_sig_table <- function(data) {
  sig_data <- spread(data[order(data$qval), c("feature", "metadata", "qval")], metadata, qval)
  sig_data$feature <- gsub("[.]", "-", sig_data$feature)
  sig_data$min <- apply(sig_data %>% dplyr::select(c("lypma", "benzonase", "molysis", "host_zero", "qiaamp")), 1, FUN = min)
  sig_data <- sig_data[order(sig_data$min),] %>% dplyr::select("feature", "lypma", "benzonase", "host_zero", "molysis", "qiaamp") %>% .[1:20,]
  sig_data[["feature"]] <- ifelse(sig_data[["feature"]] == "X.Collinsella._massiliensis", "[Collinsella]_massiliensis", sig_data[["feature"]])
  sig_data_italic <- sig_data %>% rownames_to_column(var = "-") %>%
          column_to_rownames(var = "feature") %>% dplyr::select(-c("-")) %>%
          rename(lyPMA = lypma,  Benzonase = benzonase, `HostZERO` = host_zero, MolYsis = molysis, QIAamp = qiaamp)
  sig_data_sig <- ifelse(sig_data_italic < 0.1, "*", NA) %>% data.frame(check.names = F)
  return(list(data = sig_data, data_italic = sig_data_italic, data_sig = sig_data_sig))
}

#f_fit_data_pos <- make_sig_table(f_fit_data_pos)
f_fit_data_bal <- make_sig_table(f_fit_data_bal)
f_fit_data_ns <- make_sig_table(f_fit_data_ns)
f_fit_data_spt <- make_sig_table(f_fit_data_spt)

#f_pos_sig <- subset_taxa(subset_samples(phyloseq_rel_nz, sample_type == "Mock"),
#                                       taxa_names(subset_samples(phyloseq_rel_nz, sample_type == "Mock")) %in% f_fit_data_pos$data$feature)
#f_fit_data_pos$rel <- cbind(f_pos_sig %>% otu_table %>% t, f_pos_sig %>% sample_data) %>% group_by(treatment) %>% summarise_if(is.numeric, mean, na.rm = TRUE) %>% .[, 1:21] %>% column_to_rownames(., "treatment") %>% t () %>% data.frame(check.names = F) %>% 
#        .[row.names(f_fit_data_pos$data_italic),] %>%  mutate_all(~na_if(., 0)) %>% rownames_to_column("feature")


f_spt_sig <- subset_taxa(subset_samples(phyloseq_rel_nz_f, sample_type == "Sputum"),
                                       taxa_names(subset_samples(phyloseq_rel_nz_f, sample_type == "Sputum")) %in% f_fit_data_spt$data$feature)


f_fit_data_spt$rel <- cbind(f_spt_sig %>% otu_table %>% t, f_spt_sig %>% sample_data) %>% group_by(treatment) %>% summarise_if(is.numeric, mean, na.rm = TRUE) %>% .[, 1:21] %>% column_to_rownames(., "treatment") %>% t () %>% data.frame(check.names = F) %>% 
        .[row.names(f_fit_data_spt$data_italic),] %>%  mutate_all(~na_if(., 0)) %>% rownames_to_column("feature")

f_ns_sig <- subset_taxa(subset_samples(phyloseq_rel_nz_f, sample_type == "Nasal"),
                                       taxa_names(subset_samples(phyloseq_rel_nz_f, sample_type == "Nasal")) %in% f_fit_data_ns$data$feature)

f_fit_data_ns$rel <- cbind(f_ns_sig %>% otu_table %>% t, f_ns_sig %>% sample_data) %>% group_by(treatment) %>% summarise_if(is.numeric, mean, na.rm = TRUE) %>% .[, 1:21] %>% column_to_rownames(., "treatment") %>% t () %>% data.frame(check.names = F) %>% 
        .[row.names(f_fit_data_ns$data_italic),] %>%  mutate_all(~na_if(., 0)) %>% rownames_to_column("feature")
f_fit_data_ns$rel$feature <- row.names(f_fit_data_ns$data_sig)

f_bal_sig <- subset_taxa(subset_samples(phyloseq_rel_nz_f, sample_type == "BAL"),
                                       taxa_names(subset_samples(phyloseq_rel_nz_f, sample_type == "BAL")) %in% f_fit_data_bal$data$feature)

f_fit_data_bal$rel <- cbind(f_bal_sig %>% otu_table %>% t, f_bal_sig %>% sample_data) %>% group_by(treatment) %>% summarise_if(is.numeric, mean, na.rm = TRUE) %>% .[, 1:21] %>% column_to_rownames(., "treatment") %>% t () %>% data.frame(check.names = F) %>%
        .[row.names(f_fit_data_bal$data_italic),] %>%
        mutate_all(~na_if(., 0)) %>% rownames_to_column("feature")

MaAslin renaming

rename_function_gruop <- function(list_maaslin_result){
        taxa_df <- tax_table(phyloseq$phyloseq_path_cpm) %>% 
                data.frame %>% remove_rownames() %>% rename(feature = "pathway")
        tax_table(phyloseq$phyloseq_path_cpm)
        list_maaslin_result$data <-
                list_maaslin_result$data %>% 
                merge(., taxa_df, by = "feature") %>% 
                dplyr::select(-c("feature")) %>%
                rename(feature = "group") %>%
                dplyr::select(c("feature", "lypma", "benzonase", "host_zero", "molysis", "qiaamp"))
        list_maaslin_result$data_italic <- 
                list_maaslin_result$data_italic %>% 
                rownames_to_column("feature") %>%
                merge(., taxa_df, by = "feature") %>% 
                dplyr::select(-c("feature")) %>%
                column_to_rownames("group")
        list_maaslin_result$data_sig <-
                list_maaslin_result$data_sig  %>% 
                        rownames_to_column("feature") %>%
                merge(., taxa_df, by = "feature") %>% 
                dplyr::select(-c("feature")) %>%
                column_to_rownames("group")
        list_maaslin_result$rel <-
                list_maaslin_result$rel %>% 
                merge(., taxa_df, by = "feature") %>% 
                dplyr::select(-c("feature")) %>%
                rename(feature = "group") %>%
                dplyr::select(c("feature", "Untreated", "lyPMA", "Benzonase", "HostZERO", "MolYsis", "QIAamp"))
        list_maaslin_result
        
}


#f_fit_data_pos <- rename_function_gruop(f_fit_data_pos)
f_fit_data_bal <- rename_function_gruop(f_fit_data_bal)
f_fit_data_ns <- rename_function_gruop(f_fit_data_ns)
f_fit_data_spt <- rename_function_gruop(f_fit_data_spt)

Functional MaAsLin raw results

f_maaslin_all

Fig. S10 MaAslin function - volcano plot

Fig. S10. Volcano plot of differential abundance of function by each treatment with a model MaAsLin (copies per million of each function ~ sample type + lyPMA + Benzonase + HostZERO + MolYsis + QIAamp, random effect = subject id).

#Volcano plot

figS9 <- ggplot(f_maaslin_all, aes(y = -log10(qval), x = Estimate, col = metadata)) +
        theme_classic(base_family = "sans") +
        #labs(tag = "A") +
        geom_point(size = 2, alpha = 0.3, stroke = 0) +
        xlab("Change estimate of CLR-normalized CPM") +
        ylab("-log<sub>10</sub>(*q*-value)") +
        geom_hline(yintercept = 1, col = "gray") +
        geom_vline(xintercept = 0, col = "gray") +
        annotate(family = "sans",
                 geom='richtext',
                 x=0, y=10,
                 label = "<i>q</i>-value = 0.1, fold-change = 0") +
        theme(legend.position = "top", axis.title.y = ggtext::element_markdown()) +
        scale_color_manual(values = c("#a65628",
                                      "grey",
                                      "#fb9a99",
                                      "#33a02c",
                                      "#b2df8a",
                                      "#1f78b4",
                                      "#a6cee3"),
                           breaks = c("log10.Final_reads",
                                      "sample_type",
                                      "lypma",
                                      "benzonase",
                                      "host_zero",
                                      "molysis",
                                      "qiaamp"), 
                           labels = c("log<sub>10</sub>(Final reads)",
                                      "Sample type",
                                      "lyPMA",
                                      "Benzonase",
                                      "HostZERO",
                                      "MolYsis",
                                      "QIAamp")) + #color using https://colorbrewer2.org/#type=qualitative&scheme=Set1&n=6
        guides(col = guide_legend(title = "Factors", title.position = "top", nrow = 2))


png(file = "Project_SICAS2_microbiome/7_Manuscripts/2022_MGK_Host_Depletion/Figures/FigureS11.png",   # The directory you want to save the file in
    width = 180, # The width of the plot in inches
    height = 90, # The height of the plot in inches
    units = "mm",
    res = 600
) #fixing multiple page issue

figS9
# alpha diversity plots
#ggarrange(f4ad, ggarrange(f4e, f4f, ncol = 2),
#          ncol = 1) # alpha diversity plots

dev.off()
## quartz_off_screen 
##                 2
figS9

Fig. S11 Function baloon plot

Fig. S11. Mean copies per million of top 20 significant function identified by differential abundance analysis using MaAsLin. Analyses were stratified by sample type. (A) Mock community, (B) bronchoalveolar lavage, (C) nasal swabs, and (D) sputum. Statistical significances were noted at the level of q-value < 0.1.

#ffff33 qia

f5b <- merge(f_fit_data_bal$rel %>%
              gather(treatment,
                     value,
                     Untreated:QIAamp,
                     factor_key=TRUE),
      f_fit_data_bal$data_italic %>%
              rownames_to_column("feature") %>%
              gather(treatment,
                     qval,
                     lyPMA:QIAamp,
                     factor_key=TRUE),
      by.x = c('feature', 'treatment'),
      by.y = c('feature', 'treatment'),
      all = T) %>%
        
        merge(f_fit_data_bal$data_sig %>%
              rownames_to_column("feature") %>%
              gather(treatment,
                     sig,
                     lyPMA:QIAamp,
                     factor_key=TRUE),
              by.x = c('feature', 'treatment'),
              by.y = c('feature', 'treatment'),
              all = T) %>%
        mutate(sig = case_when(sig < 0.1 ~ "< 0.1",
                               .default = "> 0.1"),
               value = value * 1000000) %>%



#Baloon plot
        ggballoonplot(size = "value", y = "feature", x= "treatment", fill = "sig") +
        theme_classic(base_family = "sans") +
        #colors for qvalues
        #xlab("Experimental group") +
        #ylab("Species") +
        #labs(tag = "A")  +
        ggtitle("A  BAL") +
        theme(panel.grid.major = element_line(colour = "grey"),
              legend.position = "top",
              axis.text.x = element_text(angle = 45, vjust = 1, hjust=1),
              #Element markdown for taxa name italicizing
              axis.text.y = ggtext::element_markdown(size = 8),
              axis.title.y = element_blank(),
              axis.title.x = element_blank(),
              plot.margin = unit(c(0,0.2,0,1), 'lines'))  +
        scale_fill_manual(values = c("red", "grey"), aes(y = feature,
                      x = treatment,
                      label = sig)) +
        guides(fill = guide_legend(title = c(expression(paste(italic("q"),
                                                       "-value",
                                                       sep = ""))),
                                   title.position = "top",
                                   nrow = 2,
                                   override.aes = list(size=3)),
               size = guide_legend(title = "Copies per million",
                                   title.position = "top",
                                   order = 1,
                                   nrow = 1)
               ) + 
        scale_x_discrete(labels=c("control" = "Untreated",
                                  "lypma" = "lyPMA",
                                  "benzonase" = "Benzonase",
                                  "host_zero" = "Host-zero",
                                  "molysis" = "MolYsis",
                                  "qiaamp" = "QIAamp")
                         )

f5c <- merge(f_fit_data_ns$rel %>%
              gather(treatment,
                     value,
                     Untreated:QIAamp,
                     factor_key=TRUE),
      f_fit_data_ns$data_italic %>%
              rownames_to_column("feature") %>%
              gather(treatment,
                     qval,
                     lyPMA:QIAamp,
                     factor_key=TRUE),
      by.x = c('feature', 'treatment'),
      by.y = c('feature', 'treatment'),
      all = T) %>%
        
        merge(f_fit_data_ns$data_sig %>%
              rownames_to_column("feature") %>%
              gather(treatment,
                     sig,
                     lyPMA:QIAamp,
                     factor_key=TRUE),
              by.x = c('feature', 'treatment'),
              by.y = c('feature', 'treatment'),
              all = T) %>%
        mutate(sig = case_when(sig < 0.1 ~ "< 0.1",
                               .default = "> 0.1"),
               value = value * 1000000) %>%


#Baloon plot
        ggballoonplot(size = "value", y = "feature", x= "treatment", fill = "sig") +
        theme_classic(base_family = "sans") +
        #colors for qvalues
        #xlab("Experimental group") +
        #ylab("Species") +
        #labs(tag = "B")  +
        ggtitle("B  Nasal swab") +
        theme(panel.grid.major = element_line(colour = "grey"),
              legend.position = "top",
              axis.text.x = element_text(angle = 45, vjust = 1, hjust=1),
              #Element markdown for taxa name italicizing
              axis.text.y = ggtext::element_markdown(size = 8),
              axis.title.y = element_blank(),
              axis.title.x = element_blank(),
              plot.margin = unit(c(0,0.2,0,1), 'lines'))  +
        scale_fill_manual(values = c("red", "grey"), aes(y = feature,
                      x = treatment,
                      label = sig)) +
        guides(fill = guide_legend(title = c(expression(paste(italic("q"),
                                                       "-value",
                                                       sep = ""))),
                                   title.position = "top",
                                   override.aes = list(size=3)),
               size = guide_legend(title = "Copies per million",
                                   title.position = "top",
                                   order = 1,
                                   nrow = 1)
               ) + 
        scale_x_discrete(labels=c("control" = "Untreated",
                                  "lypma" = "lyPMA",
                                  "benzonase" = "Benzonase",
                                  "host_zero" = "Host-zero",
                                  "molysis" = "MolYsis",
                                  "qiaamp" = "QIAamp")
                         )

f5d <- merge(f_fit_data_spt$rel %>%
              gather(treatment,
                     value,
                     Untreated:QIAamp,
                     factor_key=TRUE),
      f_fit_data_spt$data_italic %>%
              rownames_to_column("feature") %>%
              gather(treatment,
                     qval,
                     lyPMA:QIAamp,
                     factor_key=TRUE),
      by.x = c('feature', 'treatment'),
      by.y = c('feature', 'treatment'),
      all = T) %>%
        merge(f_fit_data_spt$data_sig %>%
              rownames_to_column("feature") %>%
              gather(treatment,
                     sig,
                     lyPMA:QIAamp,
                     factor_key=TRUE),
              by.x = c('feature', 'treatment'),
              by.y = c('feature', 'treatment'),
              all = T) %>%
        mutate(sig = case_when(sig < 0.1 ~ "< 0.1",
                               .default = "> 0.1"),
               value = value * 1000000) %>%


#Baloon plot
        ggballoonplot(size = "value", y = "feature", x= "treatment", fill = "sig") +
        theme_classic(base_family = "sans") +
        #colors for qvalues
        #xlab("Experimental group") +
        #ylab("Species") +
        #labs(tag = "C")  +
        ggtitle("C  Sputum") +
        theme(panel.grid.major = element_line(colour = "grey"),
              legend.position = "top",
              axis.text.x = element_text(angle = 45, vjust = 1, hjust=1),
              #Element markdown for taxa name italicizing
              axis.text.y = ggtext::element_markdown(size = 8),
              axis.title.y = element_blank(),
              axis.title.x = element_blank(),
              plot.margin = unit(c(0,0.2,0,1), 'lines'))  +
        scale_fill_manual(values = c("red", "grey"), aes(y = feature,
                      x = treatment,
                      label = sig)) +
        guides(fill = guide_legend(title = c(expression(paste(italic("q"),
                                                       "-value",
                                                       sep = ""))),
                                   title.position = "top",
                                   override.aes = list(size=5)),
               size = guide_legend(title = "Copies per million",
                                   title.position = "top",
                                   order = 1,
                                   nrow = 1)
               ) + 
        scale_x_discrete(labels=c("control" = "Untreated",
                                  "lypma" = "lyPMA",
                                  "benzonase" = "Benzonase",
                                  "host_zero" = "Host-zero",
                                  "molysis" = "MolYsis",
                                  "qiaamp" = "QIAamp")
                         )

figS10 <- ggarrange(f5d %>% lemon::g_legend() %>% as_ggplot,
                  f5b,
                  f5c,
                  f5d,
                  ncol=1, heights = c(1.5, 4, 4, 4),
                  legend = "none",
                  align = "hv")


annotate_figure(figS10,
                left = text_grob("Predicted function",
                                 rot = 90,
                                 family = "sans", 
                                 size = 11),
                bottom = text_grob("Treatment",
                                 rot = 0,
                                 family = "sans", 
                                 size = 11, hjust = -3)
                
)

png(file = "Project_SICAS2_microbiome/7_Manuscripts/2022_MGK_Host_Depletion/Figures/FigureS12.png",   # The directory you want to save the file in
    width = 240, # The width of the plot in inches
    height = 220, # The height of the plot in inches
    units = "mm",
    res = 600
) #fixing multiple page issue



annotate_figure(figS10,
                left = text_grob("Predicted function",
                                 rot = 90,
                                 family = "sans", 
                                 size = 11),
                bottom = text_grob("Treatment",
                                 rot = 0,
                                 family = "sans", 
                                 size = 11, hjust = -3)
                
)


# alpha diversity plots
#ggarrange(f4ad, ggarrange(f4e, f4f, ncol = 2),
#          ncol = 1) # alpha diversity plots

dev.off()
## quartz_off_screen 
##                 2

3.5. Sensitivity analysis after decontamination

xiii. Is species richness similar after decontamination?

Sensitivity analysis

Fig. S12. Rarefraction curve of species richenss

Fig. S12. Rarefaction curve for (A) species richness and (B) function richenss stratified by sample type, after removing possible contaminant-taxa identified by decontam and low prevalent taxa.

As a sanity check, rarefaction curves were generated and seemed to be saturated

fig_rarefraction <- phyloseq$phyloseq_count %>% 
        sample_data %>% 
        data.frame %>%
        subset(., !is.na(.$treatment) & sample_type %in% c("BAL", "Nasal", "Sputum")) %>%
ggplot(., aes(x = log10(Final_reads/1000000), y = S.obs, col = treatment)) +
        geom_point() +
        theme_classic(base_family = "sans") +
        #xlab("Final reads x 10<sup>6</sup>") +
        ylab("Microbial species richness") +
        labs(tag = "A") +
        theme(axis.title.x = element_blank(), legend.position = "top") +
        guides(col = guide_legend(title = "Treatment", title.position = "top", nrow = 1)) +
        scale_color_manual(values = c("#e31a1c", "#fb9a99", "#33a02c", "#b2df8a", "#1f78b4", "#a6cee3"),
                          name = "Treatment", labels = c("Untreated","lyPMA", "Benzonase", "HostZERO", "MolYsis", "QIAamp")) +
        facet_wrap(~sample_type, scales = "free", nrow = 1)

fig_rarefraction_function <- phyloseq$phyloseq_path_rpk %>% 
        sample_data %>% 
        data.frame %>%
        subset(., !is.na(.$treatment) & sample_type %in% c("BAL", "Nasal", "Sputum")) %>%
ggplot(., aes(x = log10(Final_reads/1000000), y = S.obs, col = treatment)) +
        geom_point() +
        theme_classic(base_family = "sans") +
        xlab("log<sub>10</sub>(Final reads x 10<sup>6</sup>)") +
        ylab("Functional richness") +
        labs(tag = "C") +
        theme(axis.title.x = element_markdown(), legend.position = "top") +
        guides(col = guide_legend(title = "Treatment", title.position = "top", nrow = 1)) +
        scale_color_manual(values = c("#e31a1c", "#fb9a99", "#33a02c", "#b2df8a", "#1f78b4", "#a6cee3"),
                          name = "Treatment", labels = c("Untreated","lyPMA", "Benzonase", "HostZERO", "MolYsis", "QIAamp")) +
        facet_wrap(~sample_type, scales = "free", nrow = 1)



fig_rarefraction_viral <- phyloseq$phyloseq_count %>% 
        sample_data %>% 
        data.frame %>%
        subset(., !is.na(.$treatment) & sample_type %in% c("BAL", "Nasal", "Sputum")) %>%
ggplot(., aes(x = log10(Final_reads/1000000), y = V.obs, col = treatment)) +
        geom_point() +
        theme_classic(base_family = "sans") +
        #xlab("Final reads x 10<sup>6</sup>") +
        ylab("Viral richness") +
        labs(tag = "B") +
        theme(axis.title.x = element_blank(), legend.position = "top",) +
        guides(col = guide_legend(title = "Treatment", title.position = "top", nrow = 1)) +
        scale_color_manual(values = c("#e31a1c", "#fb9a99", "#33a02c", "#b2df8a", "#1f78b4", "#a6cee3"),
                          name = "Treatment", labels = c("Untreated","lyPMA", "Benzonase", "HostZERO", "MolYsis", "QIAamp")) +
        facet_wrap(~sample_type, scales = "free", nrow = 1)



png(file = "Project_SICAS2_microbiome/7_Manuscripts/2022_MGK_Host_Depletion/Figures/FigureS13_log.png",   # The directory you want to save the file in
    width = 180, # The width of the plot in inches
    height = 180, # The height of the plot in inches
    units = "mm",
    res = 600
) #fixing multiple page issue

ggarrange(fig_rarefraction, fig_rarefraction_viral, fig_rarefraction_function, common.legend = T, ncol = 1)

dev.off()
## quartz_off_screen 
##                 2
fig_rarefraction <- phyloseq$phyloseq_count %>% 
        sample_data %>% 
        data.frame %>%
        subset(., !is.na(.$treatment) & sample_type %in% c("BAL", "Nasal", "Sputum")) %>%
ggplot(., aes(x = Final_reads/1000000, y = S.obs, col = treatment)) +
        geom_point() +
        theme_classic(base_family = "sans") +
        #xlab("Final reads x 10<sup>6</sup>") +
        ylab("Microbial species richness") +
        labs(tag = "A") +
        theme(axis.title.x = element_blank(), legend.position = "top") +
        guides(col = guide_legend(title = "Treatment", title.position = "top", nrow = 1)) +
        scale_color_manual(values = c("#e31a1c", "#fb9a99", "#33a02c", "#b2df8a", "#1f78b4", "#a6cee3"),
                          name = "Treatment", labels = c("Untreated","lyPMA", "Benzonase", "HostZERO", "MolYsis", "QIAamp")) +
        facet_wrap(~sample_type, scales = "free", nrow = 1)

fig_rarefraction_function <- phyloseq$phyloseq_path_rpk %>% 
        sample_data %>% 
        data.frame %>%
        subset(., !is.na(.$treatment) & sample_type %in% c("BAL", "Nasal", "Sputum")) %>%
ggplot(., aes(x = Final_reads/1000000, y = S.obs, col = treatment)) +
        geom_point() +
        theme_classic(base_family = "sans") +
        xlab("Final reads x 10<sup>6</sup>") +
        ylab("Functional richness") +
        labs(tag = "C") +
        theme(axis.title.x = element_markdown(), legend.position = "top") +
        guides(col = guide_legend(title = "Treatment", title.position = "top", nrow = 1)) +
        scale_color_manual(values = c("#e31a1c", "#fb9a99", "#33a02c", "#b2df8a", "#1f78b4", "#a6cee3"),
                          name = "Treatment", labels = c("Untreated","lyPMA", "Benzonase", "HostZERO", "MolYsis", "QIAamp")) +
        facet_wrap(~sample_type, scales = "free", nrow = 1)



fig_rarefraction_viral <- phyloseq$phyloseq_count %>% 
        sample_data %>% 
        data.frame %>%
        subset(., !is.na(.$treatment) & sample_type %in% c("BAL", "Nasal", "Sputum")) %>%
ggplot(., aes(x = Final_reads/1000000, y = V.obs, col = treatment)) +
        geom_point() +
        theme_classic(base_family = "sans") +
        #xlab("Final reads x 10<sup>6</sup>") +
        ylab("Viral richness") +
        labs(tag = "B") +
        theme(axis.title.x = element_blank(), legend.position = "top",) +
        guides(col = guide_legend(title = "Treatment", title.position = "top", nrow = 1)) +
        scale_color_manual(values = c("#e31a1c", "#fb9a99", "#33a02c", "#b2df8a", "#1f78b4", "#a6cee3"),
                          name = "Treatment", labels = c("Untreated","lyPMA", "Benzonase", "HostZERO", "MolYsis", "QIAamp")) +
        facet_wrap(~sample_type, scales = "free", nrow = 1)

ggarrange(fig_rarefraction, fig_rarefraction_viral, fig_rarefraction_function, common.legend = T, ncol = 1)

png(file = "Project_SICAS2_microbiome/7_Manuscripts/2022_MGK_Host_Depletion/Figures/FigureS13_updated.png",   # The directory you want to save the file in
    width = 180, # The width of the plot in inches
    height = 180, # The height of the plot in inches
    units = "mm",
    res = 600
) #fixing multiple page issue

ggarrange(fig_rarefraction, fig_rarefraction_viral, fig_rarefraction_function, common.legend = T, ncol = 1)

dev.off()
## quartz_off_screen 
##                 2

Fig. S13. Species richness of decontaminated output

As some of the species richness got way higher, this data cannot be used for alpha diversity indices.

Fig. S13. Species richness of (A) raw data after prevalence and abundance filtering, (B) decontaminated species richness with decontam37, and (C) decontaminated data using tinyvamp.

f10a <- ggplot(subset(sample_data(phyloseq$phyloseq_count) %>% 
                             data.frame, sample_data(phyloseq$phyloseq_count)$sample_type %in% c("Sputum", "Nasal", "BAL"#, "Mock"
                                                                                                 )), aes(x = treatment, y = S.obs)) +
        geom_jitter(aes(color = treatment), position = position_jitter(0.2),
                    size = 1.2, alpha = 0.3, stroke = 0) +
        stat_summary(aes(color = treatment),
                             fun.data="mean_sdl",  fun.args = list(mult=1), 
                             geom = "pointrange",  size = 0.4) +
        ylab("Species richness") +
        xlab("Treatment group") +
        theme_classic (base_size = 12, base_family = "sans") + 
        scale_color_manual(values = c("#e31a1c", "#fb9a99", "#33a02c", "#b2df8a", "#1f78b4", "#a6cee3"), name = "Treatment", labels = c("Untreated","lyPMA", "Benzonase", "HostZERO", "MolYsis", "QIAamp")) + #color using 
        labs(tag = "A") +
        ggtitle("Prevalence & abundance filtered data") +
        theme(plot.tag = element_text(size = 15),
              axis.text.x = element_blank(),
              axis.title.x = element_blank(),
              axis.title.y = element_blank(),
              axis.ticks.x = element_blank(),
              legend.position = "top") +
        facet_wrap(~sample_type, nrow = 1) + 
        guides(col = guide_legend(nrow = 1)) 

f10b <- ggplot(subset(sample_data(phyloseq$phyloseq_count) %>% 
                             data.frame, sample_data(phyloseq$phyloseq_count)$sample_type %in% c("Sputum", "Nasal", "BAL"#, "Mock"
                                                                                                 )), aes(x = treatment, y = S.obs)) +
        geom_jitter(aes(color = treatment), position = position_jitter(0.2),
                    size = 1.2, alpha = 0.3, stroke = 0) +
        stat_summary(aes(color = treatment),
                             fun.data="mean_sdl",  fun.args = list(mult=1), 
                             geom = "pointrange",  size = 0.4) +
        ylab("Species richness") +
        xlab("Treatment group") +
        theme_classic (base_size = 12, base_family = "sans") + 
        scale_color_manual(values = c("#e31a1c", "#fb9a99", "#33a02c", "#b2df8a", "#1f78b4", "#a6cee3"), name = "Treatment", labels = c("Untreated","lyPMA", "Benzonase", "HostZERO", "MolYsis", "QIAamp")) + #color using 
        labs(tag = "B") +
        ggtitle("Decontaminated data 1") +
        theme(plot.tag = element_text(size = 15),
              axis.text.x = element_blank(),
              axis.title.x = element_blank(),
              axis.title.y = element_blank(),
              axis.ticks.x = element_blank(),
              legend.position = "top") +
        facet_wrap(~sample_type, nrow = 1) + 
        guides(col = guide_legend(nrow = 1)) 


f10c <- ggplot(subset(sample_data(phyloseq_tv) %>% 
                             data.frame, sample_data(phyloseq_tv)$sample_type %in% c("Sputum", "Nasal", "BAL", "Mock")), aes(x = treatment, y = S.obs)) +
        geom_jitter(aes(color = treatment), position = position_jitter(0.2),
                    size = 1.2, alpha = 0.3, stroke = 0) +
        stat_summary(aes(color = treatment),
                             fun.data="mean_sdl",  fun.args = list(mult=1), 
                             geom = "pointrange",  size = 0.4) +
        ylab("Species richness") +
        xlab("Treatment group") +
        theme_classic (base_size = 12, base_family = "sans") + 
        scale_color_manual(values = c("#e31a1c", "#fb9a99", "#33a02c", "#b2df8a", "#1f78b4", "#a6cee3"), name = "Treatment", labels = c("Untreated","lyPMA", "Benzonase", "HostZERO", "MolYsis", "QIAamp")) + #color using 
        labs(tag = "C") +
        ggtitle("Decontaminated data 2") +
        theme(plot.tag = element_text(size = 15),
              axis.text.x = element_blank(), 
              axis.title.x = element_blank(),
              axis.title.y = element_blank(),
              axis.ticks.x = element_blank(),
              legend.position = "top") +
        facet_wrap(~sample_type, nrow = 1) + 
        guides(col = guide_legend(nrow = 1)) 

figa1 <- ggarrange(f10a, f10b, f10c, common.legend = T, ncol = 1)



annotate_figure(figa1,
                left = text_grob("Species richness",
                                 rot = 90,
                                 family = "sans", 
                                 size = 11),
                bottom = text_grob("Treatment",
                                 rot = 0,
                                 family = "sans", 
                                 size = 11)
)

png(file = "Project_SICAS2_microbiome/7_Manuscripts/2022_MGK_Host_Depletion/Figures/FigureS14.png",   # The directory you want to save the file in
    width = 180, # The width of the plot in inches
    height = 180, # The height of the plot in inches
    units = "mm",
    res = 600
) #fixing multiple page issue

annotate_figure(figa1,
                left = text_grob("Species richness",
                                 rot = 90,
                                 family = "sans", 
                                 size = 11),
                bottom = text_grob("Treatment",
                                 rot = 0,
                                 family = "sans", 
                                 size = 11)
)


dev.off()
## quartz_off_screen 
##                 2

Table S11. Species richness change after decontamination

Table S11. Effect size (95% confidence interval) and p-value of decontaminated species richness using decontam37 (decontaminated data 1) and tinyvamp (decontaminated data 2). The change was tested using a model lmer(species richness~ treatment + (1|subject id)). Statistical significances were noted with : p-value < 0.05, : p-value < 0.01, and : p-value < 0.001.

Raw results - decontam/BAL

sr_lmer_bal_decontam <- lmer(S.obs ~ treatment + (1|subject_id),
                    data = sample_data(phyloseq_decontam$phyloseq_rel) %>% 
                             data.frame %>% 
                            subset(.,.$sample_type == "BAL"))
sr_lmer_bal_decontam %>% summary
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: S.obs ~ treatment + (1 | subject_id)
##    Data: sample_data(phyloseq_decontam$phyloseq_rel) %>% data.frame %>%  
##     subset(., .$sample_type == "BAL")
## 
## REML criterion at convergence: 190.8
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -1.3988 -0.5046 -0.1313  0.5047  2.6286 
## 
## Random effects:
##  Groups     Name        Variance Std.Dev.
##  subject_id (Intercept) 93.81    9.685   
##  Residual               78.16    8.841   
## Number of obs: 30, groups:  subject_id, 5
## 
## Fixed effects:
##                    Estimate Std. Error     df t value Pr(>|t|)   
## (Intercept)           3.400      5.865  9.647   0.580  0.57537   
## treatmentlyPMA        0.600      5.591 20.000   0.107  0.91561   
## treatmentBenzonase    5.800      5.591 20.000   1.037  0.31197   
## treatmentHostZERO     8.200      5.591 20.000   1.467  0.15805   
## treatmentMolYsis     17.800      5.591 20.000   3.183  0.00467 **
## treatmentQIAamp       9.200      5.591 20.000   1.645  0.11552   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr) trtPMA trtmnB tHZERO trtmMY
## trtmntlyPMA -0.477                            
## trtmntBnzns -0.477  0.500                     
## trtmntHZERO -0.477  0.500  0.500              
## trtmntMlYss -0.477  0.500  0.500  0.500       
## trtmntQIAmp -0.477  0.500  0.500  0.500  0.500

Raw results - decontam/nasal

sr_lmer_ns_decontam <- lmer(S.obs ~ treatment + (1|subject_id),
                    data = sample_data(phyloseq_decontam$phyloseq_rel) %>% 
                             data.frame %>% 
                            subset(.,.$sample_type == "Nasal"))
sr_lmer_ns_decontam %>% summary
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: S.obs ~ treatment + (1 | subject_id)
##    Data: sample_data(phyloseq_decontam$phyloseq_rel) %>% data.frame %>%  
##     subset(., .$sample_type == "Nasal")
## 
## REML criterion at convergence: 175.7
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -1.8245 -0.5693 -0.1774  0.6449  3.0756 
## 
## Random effects:
##  Groups     Name        Variance Std.Dev.
##  subject_id (Intercept)  1.089   1.043   
##  Residual               16.499   4.062   
## Number of obs: 35, groups:  subject_id, 10
## 
## Fixed effects:
##                    Estimate Std. Error      df t value Pr(>|t|)    
## (Intercept)          8.4000     1.3262 28.7973   6.334  6.6e-07 ***
## treatmentlyPMA      -5.6016     2.2464 25.2088  -2.494 0.019560 *  
## treatmentBenzonase  -0.5743     2.2468 25.3111  -0.256 0.800319    
## treatmentHostZERO    8.4257     2.2468 25.3111   3.750 0.000924 ***
## treatmentMolYsis     5.2016     2.2464 25.2088   2.316 0.029002 *  
## treatmentQIAamp      6.3743     2.2468 25.3111   2.837 0.008837 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr) trtPMA trtmnB tHZERO trtmMY
## trtmntlyPMA -0.554                            
## trtmntBnzns -0.554  0.315                     
## trtmntHZERO -0.554  0.315  0.346              
## trtmntMlYss -0.554  0.308  0.339  0.339       
## trtmntQIAmp -0.554  0.339  0.307  0.307  0.315

Raw results - decontam/spt

sr_lmer_spt_decontam <- lmer(S.obs ~ treatment + (1|subject_id),
                    data = sample_data(phyloseq_decontam$phyloseq_rel) %>% 
                             data.frame %>% 
                            subset(.,.$sample_type == "Sputum"))
sr_lmer_spt_decontam %>% summary
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: S.obs ~ treatment + (1 | subject_id)
##    Data: sample_data(phyloseq_decontam$phyloseq_rel) %>% data.frame %>%  
##     subset(., .$sample_type == "Sputum")
## 
## REML criterion at convergence: 216.5
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -2.0585 -0.4479 -0.1131  0.3572  1.7178 
## 
## Random effects:
##  Groups     Name        Variance Std.Dev.
##  subject_id (Intercept) 300.6    17.34   
##  Residual               224.7    14.99   
## Number of obs: 30, groups:  subject_id, 5
## 
## Fixed effects:
##                    Estimate Std. Error      df t value Pr(>|t|)    
## (Intercept)          15.600     10.250   9.100   1.522  0.16198    
## treatmentlyPMA       35.000      9.481  20.000   3.692  0.00144 ** 
## treatmentBenzonase   63.400      9.481  20.000   6.687 1.65e-06 ***
## treatmentHostZERO    97.600      9.481  20.000  10.295 1.94e-09 ***
## treatmentMolYsis    107.000      9.481  20.000  11.286 3.99e-10 ***
## treatmentQIAamp      80.800      9.481  20.000   8.523 4.33e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr) trtPMA trtmnB tHZERO trtmMY
## trtmntlyPMA -0.462                            
## trtmntBnzns -0.462  0.500                     
## trtmntHZERO -0.462  0.500  0.500              
## trtmntMlYss -0.462  0.500  0.500  0.500       
## trtmntQIAmp -0.462  0.500  0.500  0.500  0.500

Raw results - Tinyvamp/BAL

sr_lmer_bal_tv <- lmer(S.obs ~ treatment + (1|subject_id),
                    data = sample_data(phyloseq_tv) %>% 
                             data.frame %>% 
                            subset(.,.$sample_type == "BAL"))
sr_lmer_bal_tv %>% summary()
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: S.obs ~ treatment + (1 | subject_id)
##    Data: 
## sample_data(phyloseq_tv) %>% data.frame %>% subset(., .$sample_type ==  
##     "BAL")
## 
## REML criterion at convergence: 162.6
## 
## Scaled residuals: 
##      Min       1Q   Median       3Q      Max 
## -1.60044 -0.62449 -0.00833  0.40102  2.22867 
## 
## Random effects:
##  Groups     Name        Variance Std.Dev.
##  subject_id (Intercept) 42.84    6.545   
##  Residual               44.84    6.697   
## Number of obs: 28, groups:  subject_id, 5
## 
## Fixed effects:
##                    Estimate Std. Error     df t value Pr(>|t|)  
## (Intercept)           3.619      4.497 12.920   0.805   0.4355  
## treatmentlyPMA        1.000      4.735 18.131   0.211   0.8351  
## treatmentBenzonase    3.981      4.541 18.318   0.877   0.3920  
## treatmentHostZERO     5.981      4.541 18.318   1.317   0.2041  
## treatmentMolYsis     11.981      4.541 18.318   2.638   0.0165 *
## treatmentQIAamp       6.181      4.541 18.318   1.361   0.1900  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr) trtPMA trtmnB tHZERO trtmMY
## trtmntlyPMA -0.526                            
## trtmntBnzns -0.571  0.521                     
## trtmntHZERO -0.571  0.521  0.565              
## trtmntMlYss -0.571  0.521  0.565  0.565       
## trtmntQIAmp -0.571  0.521  0.565  0.565  0.565

Raw results - Tinyvamp/Nasal

sr_lmer_ns_tv <- lmer(S.obs ~ treatment + (1|subject_id),
                    data = sample_data(phyloseq_tv) %>% 
                             data.frame %>% 
                            subset(.,.$sample_type == "Nasal")) 
sr_lmer_ns_tv %>% summary()
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: S.obs ~ treatment + (1 | subject_id)
##    Data: 
## sample_data(phyloseq_tv) %>% data.frame %>% subset(., .$sample_type ==  
##     "Nasal")
## 
## REML criterion at convergence: 141.5
## 
## Scaled residuals: 
##      Min       1Q   Median       3Q      Max 
## -1.94756 -0.56006 -0.08626  0.42344  2.24527 
## 
## Random effects:
##  Groups     Name        Variance Std.Dev.
##  subject_id (Intercept) 0.3913   0.6255  
##  Residual               5.0338   2.2436  
## Number of obs: 35, groups:  subject_id, 10
## 
## Fixed effects:
##                    Estimate Std. Error       df t value Pr(>|t|)    
## (Intercept)         6.30000    0.73655 28.73600   8.553 2.17e-09 ***
## treatmentlyPMA     -1.51165    1.24265 25.29799  -1.216   0.2350    
## treatmentBenzonase  0.09912    1.24293 25.41039   0.080   0.9371    
## treatmentHostZERO   6.29912    1.24293 25.41039   5.068 2.99e-05 ***
## treatmentMolYsis    3.11165    1.24265 25.29799   2.504   0.0191 *  
## treatmentQIAamp     6.30088    1.24293 25.41039   5.069 2.98e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr) trtPMA trtmnB tHZERO trtmMY
## trtmntlyPMA -0.550                            
## trtmntBnzns -0.550  0.312                     
## trtmntHZERO -0.550  0.312  0.348              
## trtmntMlYss -0.550  0.304  0.340  0.340       
## trtmntQIAmp -0.550  0.340  0.303  0.303  0.312

Raw results - Tinyvamp/Sputum

sr_lmer_spt_tv <- lmer(S.obs ~ treatment + (1|subject_id),
                    data = sample_data(phyloseq_tv) %>% 
                             data.frame %>% 
                            subset(.,.$sample_type == "Sputum")) 
sr_lmer_spt_tv %>% summary()
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: S.obs ~ treatment + (1 | subject_id)
##    Data: 
## sample_data(phyloseq_tv) %>% data.frame %>% subset(., .$sample_type ==  
##     "Sputum")
## 
## REML criterion at convergence: 183.2
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -1.6411 -0.5215 -0.1671  0.4478  1.9234 
## 
## Random effects:
##  Groups     Name        Variance Std.Dev.
##  subject_id (Intercept) 75.09    8.665   
##  Residual               55.91    7.477   
## Number of obs: 30, groups:  subject_id, 5
## 
## Fixed effects:
##                    Estimate Std. Error     df t value Pr(>|t|)    
## (Intercept)           8.000      5.119  9.081   1.563  0.15221    
## treatmentlyPMA       14.800      4.729 20.000   3.130  0.00528 ** 
## treatmentBenzonase   26.200      4.729 20.000   5.540 2.01e-05 ***
## treatmentHostZERO    45.200      4.729 20.000   9.558 6.73e-09 ***
## treatmentMolYsis     51.800      4.729 20.000  10.954 6.70e-10 ***
## treatmentQIAamp      35.200      4.729 20.000   7.443 3.49e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr) trtPMA trtmnB tHZERO trtmMY
## trtmntlyPMA -0.462                            
## trtmntBnzns -0.462  0.500                     
## trtmntHZERO -0.462  0.500  0.500              
## trtmntMlYss -0.462  0.500  0.500  0.500       
## trtmntQIAmp -0.462  0.500  0.500  0.500  0.500

Tidy table of LMER for decontaminated species richness

sr_lmer_bal_decontam_kbl <- sr_lmer_bal_decontam %>%
        summary() %>%
        .$coefficients %>%
        data.frame(check.names = F) %>% 
        mutate(` ` = case_when(abs(`Pr(>|t|)`) < 0.001 ~ "***",
                               abs(`Pr(>|t|)`) < 0.01 ~ "**",
                               abs(`Pr(>|t|)`) < 0.05 ~ "*",
                               .default = " ")) %>% 
        rename("<i>p</i>-value" = "Pr(>|t|)",
               t_value = "t value") %>%
        merge(., 
              sr_lmer_bal_decontam %>% confint() %>%
                        round(digits = 1) %>% 
                        format(nsmall = 1) %>%
                        as.data.frame() %>%
                        mutate(conf = paste("(",
                                            gsub(" ","", .[,1]),
                                            ", ",
                                            gsub(" ","", .[,2]),
                                            ")",
                                            sep = "")),
              by = 0
              ) %>%
        column_to_rownames("Row.names") %>%
        rownames_to_column(var = "x") %>% mutate(x = gsub("treatment|sample_type", "", x)) %>% mutate(x = gsub(":", " * ", x)) %>%
        mutate(x = gsub("bal_log_centered_final_reads", "log<sub>10</sub>(Final reads)", x)) %>%
        column_to_rownames(var = "x") %>% 
        mutate("Effect size (95% CI)" = 
                       paste(round(Estimate, digits = 1) %>%
                                     format(nsmall = 1),
                             conf, 
                             sep = " "),
               "<i>p</i>-value" = round(`<i>p</i>-value`, 3)) %>%
        dplyr::select(c("Effect size (95% CI)", "<i>p</i>-value", " ")) %>%
        .[c("(Intercept)",
                  "lyPMA",
                  "Benzonase",
                  "HostZERO",
                  "MolYsis",
                  "QIAamp"),]
        
sr_lmer_ns_decontam_kbl  <- sr_lmer_ns_decontam %>%
        summary() %>%
        .$coefficients %>%
        data.frame(check.names = F) %>% 
        mutate(` ` = case_when(abs(`Pr(>|t|)`) < 0.001 ~ "***",
                               abs(`Pr(>|t|)`) < 0.01 ~ "**",
                               abs(`Pr(>|t|)`) < 0.05 ~ "*",
                               .default = " ")) %>% 
        rename("<i>p</i>-value" = "Pr(>|t|)",
               t_value = "t value") %>%
        merge(., 
              sr_lmer_ns_decontam %>% confint() %>%
                        round(digits = 1) %>% 
                        format(nsmall = 1) %>%
                        as.data.frame() %>%
                        mutate(conf = paste("(",
                                            gsub(" ","", .[,1]),
                                            ", ",
                                            gsub(" ","", .[,2]),
                                            ")",
                                            sep = "")),
              by = 0
              ) %>%
                        column_to_rownames("Row.names") %>%
        rownames_to_column(var = "x") %>% mutate(x = gsub("treatment|sample_type", "", x)) %>% mutate(x = gsub(":", " * ", x)) %>%
        mutate(x = gsub("bal_log_centered_final_reads", "log<sub>10</sub>(Final reads)", x)) %>%
        column_to_rownames(var = "x") %>% 
        mutate("Effect size (95% CI)" = 
                       paste(round(Estimate, digits = 1) %>%
                                     format(nsmall = 1),
                             conf, 
                             sep = " "),
               "<i>p</i>-value" = round(`<i>p</i>-value`, 3)) %>%
        dplyr::select(c("Effect size (95% CI)", "<i>p</i>-value", " ")) %>%
        .[c("(Intercept)",
                  "lyPMA",
                  "Benzonase",
                  "HostZERO",
                  "MolYsis",
                  "QIAamp"),]
        

sr_lmer_spt_decontam_kbl <- sr_lmer_spt_decontam %>%
        summary() %>%
        .$coefficients %>%
        data.frame(check.names = F) %>% 
        mutate(` ` = case_when(abs(`Pr(>|t|)`) < 0.001 ~ "***",
                               abs(`Pr(>|t|)`) < 0.01 ~ "**",
                               abs(`Pr(>|t|)`) < 0.05 ~ "*",
                               .default = " ")) %>% 
        rename("<i>p</i>-value" = "Pr(>|t|)",
               t_value = "t value") %>%
        merge(., 
              sr_lmer_spt_decontam %>% confint() %>%
                        round(digits = 1) %>% 
                        format(nsmall = 1) %>%
                        as.data.frame() %>%
                        mutate(conf = paste("(",
                                            gsub(" ","", .[,1]),
                                            ", ",
                                            gsub(" ","", .[,2]),
                                            ")",
                                            sep = "")),
              by = 0
              ) %>%
                        column_to_rownames("Row.names") %>%
        rownames_to_column(var = "x") %>% mutate(x = gsub("treatment|sample_type", "", x)) %>% mutate(x = gsub(":", " * ", x)) %>%
        mutate(x = gsub("bal_log_centered_final_reads", "log<sub>10</sub>(Final reads)", x)) %>%
        column_to_rownames(var = "x") %>% 
        mutate("Effect size (95% CI)" = 
                       paste(round(Estimate, digits = 1) %>%
                                     format(nsmall = 1),
                             conf, 
                             sep = " "),
               "<i>p</i>-value" = round(`<i>p</i>-value`, 3)) %>%
        dplyr::select(c("Effect size (95% CI)", "<i>p</i>-value", " ")) %>%
        .[c("(Intercept)",
                  "lyPMA",
                  "Benzonase",
                  "HostZERO",
                  "MolYsis",
                  "QIAamp"),]
sr_lmer_bal_tv_kbl <- sr_lmer_bal_tv %>%
        summary() %>%
        .$coefficients %>%
        data.frame(check.names = F) %>% 
        mutate(` ` = case_when(abs(`Pr(>|t|)`) < 0.001 ~ "***",
                               abs(`Pr(>|t|)`) < 0.01 ~ "**",
                               abs(`Pr(>|t|)`) < 0.05 ~ "*",
                               .default = " ")) %>% 
        rename("<i>p</i>-value" = "Pr(>|t|)",
               t_value = "t value") %>%
        merge(., 
              sr_lmer_bal_tv %>% confint() %>%
                        round(digits = 1) %>% 
                        format(nsmall = 1) %>%
                        as.data.frame() %>%
                        mutate(conf = paste("(",
                                            gsub(" ","", .[,1]),
                                            ", ",
                                            gsub(" ","", .[,2]),
                                            ")",
                                            sep = "")),
              by = 0
              ) %>%
                        column_to_rownames("Row.names") %>%
        rownames_to_column(var = "x") %>% mutate(x = gsub("treatment|sample_type", "", x)) %>% mutate(x = gsub(":", " * ", x)) %>%
        mutate(x = gsub("bal_log_centered_final_reads", "log<sub>10</sub>(Final reads)", x)) %>%
        column_to_rownames(var = "x") %>% 
        mutate("Effect size (95% CI)" = 
                       paste(round(Estimate, digits = 1) %>%
                                     format(nsmall = 1),
                             conf, 
                             sep = " "),
               "<i>p</i>-value" = round(`<i>p</i>-value`, 3)) %>%
        dplyr::select(c("Effect size (95% CI)", "<i>p</i>-value", " ")) %>%
        .[c("(Intercept)",
                  "lyPMA",
                  "Benzonase",
                  "HostZERO",
                  "MolYsis",
                  "QIAamp"),]
        
sr_lmer_ns_tv_kbl  <- sr_lmer_ns_tv %>%
        summary() %>%
        .$coefficients %>%
        data.frame(check.names = F) %>% 
        mutate(` ` = case_when(abs(`Pr(>|t|)`) < 0.001 ~ "***",
                               abs(`Pr(>|t|)`) < 0.01 ~ "**",
                               abs(`Pr(>|t|)`) < 0.05 ~ "*",
                               .default = " ")) %>% 
        rename("<i>p</i>-value" = "Pr(>|t|)",
               t_value = "t value") %>%
        merge(., 
              sr_lmer_ns_tv %>% confint() %>%
                        round(digits = 1) %>% 
                        format(nsmall = 1) %>%
                        as.data.frame() %>%
                        mutate(conf = paste("(",
                                            gsub(" ","", .[,1]),
                                            ", ",
                                            gsub(" ","", .[,2]),
                                            ")",
                                            sep = "")),
              by = 0
              ) %>%
                        column_to_rownames("Row.names") %>%
        rownames_to_column(var = "x") %>% mutate(x = gsub("treatment|sample_type", "", x)) %>% mutate(x = gsub(":", " * ", x)) %>%
        mutate(x = gsub("bal_log_centered_final_reads", "log<sub>10</sub>(Final reads)", x)) %>%
        column_to_rownames(var = "x") %>% 
        mutate("Effect size (95% CI)" = 
                       paste(round(Estimate, digits = 1) %>%
                                     format(nsmall = 1),
                             conf, 
                             sep = " "),
               "<i>p</i>-value" = round(`<i>p</i>-value`, 3)) %>%
        dplyr::select(c("Effect size (95% CI)", "<i>p</i>-value", " ")) %>%
        .[c("(Intercept)",
                  "lyPMA",
                  "Benzonase",
                  "HostZERO",
                  "MolYsis",
                  "QIAamp"),]


sr_lmer_spt_tv_kbl <- sr_lmer_spt_tv %>%
        summary() %>%
        .$coefficients %>%
        data.frame(check.names = F) %>% 
        mutate(` ` = case_when(abs(`Pr(>|t|)`) < 0.001 ~ "***",
                               abs(`Pr(>|t|)`) < 0.01 ~ "**",
                               abs(`Pr(>|t|)`) < 0.05 ~ "*",
                               .default = " ")) %>% 
        rename("<i>p</i>-value" = "Pr(>|t|)",
               t_value = "t value") %>%
        merge(., 
              sr_lmer_spt_tv %>% confint() %>%
                        round(digits = 1) %>% 
                        format(nsmall = 1) %>%
                        as.data.frame() %>%
                        mutate(conf = paste("(",
                                            gsub(" ","", .[,1]),
                                            ", ",
                                            gsub(" ","", .[,2]),
                                            ")",
                                            sep = "")),
              by = 0
              ) %>%
        column_to_rownames("Row.names") %>%
        rownames_to_column(var = "x") %>% mutate(x = gsub("treatment|sample_type", "", x)) %>% mutate(x = gsub(":", " * ", x)) %>%
        mutate(x = gsub("bal_log_centered_final_reads", "log<sub>10</sub>(Final reads)", x)) %>%
        column_to_rownames(var = "x") %>% 
        mutate("Effect size (95% CI)" = 
                       paste(round(Estimate, digits = 1) %>%
                                     format(nsmall = 1),
                             conf, 
                             sep = " "),
               "<i>p</i>-value" = round(`<i>p</i>-value`, 3)) %>%
        dplyr::select(c("Effect size (95% CI)", "<i>p</i>-value", " ")) %>%
        .[c("(Intercept)",
                  "lyPMA",
                  "Benzonase",
                  "HostZERO",
                  "MolYsis",
                  "QIAamp"),]
        


tableS11 <- cbind(
        cbind(sr_lmer_bal_decontam_kbl, sr_lmer_ns_decontam_kbl, sr_lmer_spt_decontam_kbl),
        cbind(sr_lmer_bal_tv_kbl, sr_lmer_ns_tv_kbl, sr_lmer_spt_tv_kbl) %>% remove_rownames()) %>%
    kbl(format = "html", escape = 0) %>%
        add_header_above(c(" "= 1, "BAL" = 3, "Nasal swab" = 3,"Sputum"= 3, "BAL" = 3, "Nasal swab" = 3,"Sputum"= 3)) %>% 
        add_header_above(c(" " = 1, "Decontaminated data 1" = 9, "Decontaminated data 2"= 9)) %>% 
        kable_styling(full_width = 0, html_font = "sans")


tableS11
Decontaminated data 1
Decontaminated data 2
BAL
Nasal swab
Sputum
BAL
Nasal swab
Sputum
Effect size (95% CI) p-value Effect size (95% CI) p-value Effect size (95% CI) p-value Effect size (95% CI) p-value Effect size (95% CI) p-value Effect size (95% CI) p-value
(Intercept) 3.4 (-8.0, 14.8) 0.575 8.4 (6.0, 10.8) 0.000 *** 15.6 (-4.4, 35.6) 0.162 3.6 (-4.8, 12.1) 0.436 6.3 (4.9, 7.7) 0.000 *** 8.0 (-2.0, 18.0) 0.152
lyPMA 0.6 (-9.6, 10.8) 0.916 -5.6 (-9.7, -1.5) 0.020
35.0 (17.7, 52.3) 0.001 ** 1.0 (-7.6, 9.6) 0.835 -1.5 (-3.8, 0.8) 0.235 14.8 (6.2, 23.4) 0.005 **
Benzonase 5.8 (-4.4, 16.0) 0.312 -0.6 (-4.7, 3.6) 0.800 63.4 (46.1, 80.7) 0.000 *** 4.0 (-4.2, 12.2) 0.392 0.1 (-2.2, 2.4) 0.937 26.2 (17.6, 34.8) 0.000 ***
HostZERO 8.2 (-2.0, 18.4) 0.158 8.4 (4.3, 12.6) 0.001 *** 97.6 (80.3, 114.9) 0.000 *** 6.0 (-2.2, 14.2) 0.204 6.3 (4.0, 8.6) 0.000 *** 45.2 (36.6, 53.8) 0.000 ***
MolYsis 17.8 (7.6, 28.0) 0.005 ** 5.2 (1.1, 9.3) 0.029
107.0 (89.7, 124.3) 0.000 *** 12.0 (3.8, 20.2) 0.017
3.1 (0.8, 5.4) 0.019
51.8 (43.2, 60.4) 0.000 ***
QIAamp 9.2 (-1.0, 19.4) 0.116 6.4 (2.2, 10.5) 0.009 ** 80.8 (63.5, 98.1) 0.000 *** 6.2 (-2.0, 14.4) 0.190 6.3 (4.0, 8.6) 0.000 *** 35.2 (26.6, 43.8) 0.000 ***
save_kable(tableS11, file = "Project_SICAS2_microbiome/7_Manuscripts/2022_MGK_Host_Depletion/Figures/tableS11.html", self_contained = T)

Beta diversity plot

figd2a <- ordinate(subset_samples(phyloseq_rel_nz, sample_type != "Neg." & sample_type != "Mock"), method = "PCoA", distance = "horn") %>%
        plot_ordination(phyloseq_rel_nz, ., col = "treatment") +
        #scale_color_viridis(discrete = 6, name = "Treatment", labels = c("Mock theoretical", "Control","lyPMA", "Benzonase", "HostZERO", "MolYsis", "QIAamp")) +
        scale_color_manual(values = c("#e31a1c", "#fb9a99", "#33a02c", "#b2df8a", "#1f78b4", "#a6cee3"),
                           name = "Treatment",
                           breaks = c("Untreated","lyPMA", "Benzonase", "HostZERO", "MolYsis", "QIAamp"),
                           labels = c("Untreated", "lyPMA", "Benzonase", "HostZERO", "MolYsis", "QIAamp")) + #color using https://colorbrewer2.org/#type=qualitative&scheme=Set1&n=6
        #scale_shape(name = "Sample type", labels = c("Mock theoretical", "Mock")) +
        geom_point(size = 3) +
        theme_classic (base_size = 12, base_family = "sans") +
        facet_wrap(~sample_type, scales = "free") +
        labs(tag = "A") +
        ggtitle("Prevalence & abundance filtered data") +
        theme(plot.tag = element_text(size = 15), legend.position = "top")# +
        #stat_ellipse(type = "norm") +
        #stat_ellipse(type = "t")


figd2b <- ordinate(subset_samples(phyloseq_decontam$phyloseq_rel, sample_type != "Neg." & sample_type != "Mock" &
                                          S.obs != 0), method = "PCoA", distance = "horn") %>%
        plot_ordination(phyloseq_decontam$phyloseq_rel, ., col = "treatment") +
        #scale_color_viridis(discrete = 6, name = "Treatment", labels = c("Mock theoretical", "Control","lyPMA", "Benzonase", "HostZERO", "MolYsis", "QIAamp")) +
        scale_color_manual(values = c("#e31a1c", "#fb9a99", "#33a02c", "#b2df8a", "#1f78b4", "#a6cee3"),
                           name = "Treatment",
                           breaks = c("Untreated","lyPMA", "Benzonase", "HostZERO", "MolYsis", "QIAamp"),
                           labels = c("Untreated", "lyPMA", "Benzonase", "HostZERO", "MolYsis", "QIAamp")) + #color using https://colorbrewer2.org/#type=qualitative&scheme=Set1&n=6
        #scale_shape(name = "Sample type", labels = c("Mock theoretical", "Mock")) +
        geom_point(size = 3) +
        theme_classic (base_size = 12, base_family = "sans") +
        facet_wrap(~sample_type, scales = "free") +
        labs(tag = "B") +
        ggtitle("Decontaminated data 1") +
        theme(plot.tag = element_text(size = 15), legend.position = "top")# +
        #stat_ellipse(type = "norm") +
        #stat_ellipse(type = "t")


figd2c <- ordinate(subset_samples(phyloseq_tv, sample_type != "Neg." & sample_type != "Mock"), method = "PCoA", distance = "horn") %>%
        plot_ordination(phyloseq_tv, ., col = "treatment") +
        #scale_color_viridis(discrete = 6, name = "Treatment", labels = c("Mock theoretical", "Control","lyPMA", "Benzonase", "HostZERO", "MolYsis", "QIAamp")) +
        scale_color_manual(values = c("#e31a1c", "#fb9a99", "#33a02c", "#b2df8a", "#1f78b4", "#a6cee3"),
                           name = "Treatment",
                           breaks = c("Untreated","lyPMA", "Benzonase", "HostZERO", "MolYsis", "QIAamp"),
                           labels = c("Untreated", "lyPMA", "Benzonase", "HostZERO", "MolYsis", "QIAamp")) + #color using https://colorbrewer2.org/#type=qualitative&scheme=Set1&n=6
        #scale_shape(name = "Sample type", labels = c("Mock theoretical", "Mock")) +
        geom_point(size = 3) +
        theme_classic (base_size = 12, base_family = "sans") +
        facet_wrap(~sample_type, scales = "free") +
        labs(tag = "C") +
        ggtitle("Decontaminated data 2") +
        theme(plot.tag = element_text(size = 15), legend.position = "top")# +
        #stat_ellipse(type = "norm") +
        #stat_ellipse(type = "t")


figA3 <- ggarrange(figd2a, figd2b, figd2c, common.legend = T, nrow = 3)
figA3

List of contaminants (Tinyvamp)

#Stratified by sample type

prev_neg
prev_all
sample_data(phyloseq_unfiltered$phyloseq_rel)$is.neg <- grepl("Neg", sample_data(phyloseq_unfiltered$phyloseq_rel)$sample_type)

contaminants_tv <- data.frame(
        Taxa = subset(taxa_names(phyloseq_unfiltered$phyloseq_count),
       !(taxa_names(phyloseq_unfiltered$phyloseq_count) %in%
               taxa_names(phyloseq_tv))
)) 
       


merged_contaminants <- merge(contaminants_tv, prev_all %>% rownames_to_column("Taxa"), by = "Taxa") %>%
        merge(., prev_neg %>% rownames_to_column("Taxa"), by = "Taxa") %>%
        dplyr::select(c("Taxa", "Prevalence (all)", "Prevalence (negative controls)")) %>%
        .[order(-.$"Prevalence (all)", -.$"Prevalence (negative controls)"),] %>%
        remove_rownames() %>%
        subset(., .$"Prevalence (all)" != 0)


tableA3 <- merged_contaminants %>% 
        mutate(Taxa = species_italic2(Taxa)) %>%
        kbl(format = "html", escape = F) %>%
        kable_styling(full_width = 0, html_font = "sans") 

tableA3
Taxa Prevalence (all) Prevalence (negative controls)
Staphylococcus epidermidis 41 1
Streptococcus mitis 39 1
Gemella haemolysans 37 1
Streptococcus oralis 37 0
Corynebacterium atypicum 33 0
Streptococcus anginosus 33 0
Staphylococcus argenteus 32 2
Actinomyces odontolyticus 32 0
Pseudomonas aeruginosa 32 0
Slackia isoflavoniconvertens 31 0
Collinsella intestinalis 30 0
Actinomyces oris 29 0
Streptococcus salivarius 29 0
Actinomyces sp. HPA0247 28 0
Actinomyces sp. oral taxon 181 28 0
Gemella morbillorum 27 0
Rothia dentocariosa 27 0
Streptococcus infantis 27 0
Actinomyces sp. HMSC035G02 26 0
Actinomyces sp. S6 Spd3 26 0
Atopobium rimae 26 0
Olsenella scatoligenes 26 0
Prevotella melaninogenica 26 0
Streptococcus gordonii 25 0
Actinomyces sp. ICM47 24 0
Cutibacterium granulosum 24 0
Gemella bergeri 24 0
Streptococcus australis 24 0
Streptococcus sanguinis 24 0
Veillonella dispar 24 0
Actinomyces naeslundii 23 0
Streptococcus sp. F0442 23 0
Granulicatella adiacens 22 0
Staphylococcus schweitzeri 21 2
Collinsella stercoris 21 0
Streptococcus peroris 21 0
Actinomyces sp. oral taxon 180 20 0
Eubacterium infirmum 20 0
Streptococcus sp. A12 20 0
Streptococcus vestibularis 20 0
Veillonella atypica 20 0
Actinomyces viscosus 19 0
Streptococcus pseudopneumoniae 19 0
Streptococcus sp. HPH0090 19 0
Propionibacterium namnetense 18 1
Corynebacterium durum 18 0
Eubacterium brachy 18 0
Propionibacterium humerusii 18 0
Streptococcus pneumoniae 18 0
Abiotrophia defectiva 17 0
Enorma massiliensis 17 0
Parvimonas sp. oral taxon 393 17 0
Porphyromonas somerae 17 0
Prevotella histicola 17 0
Veillonella infantium 17 0
Actinomyces johnsonii 16 0
Actinomyces meyeri 16 0
Actinomyces sp. oral taxon 170 16 0
Mogibacterium pumilum 16 0
Olsenella profusa 16 0
Rothia aeria 16 0
Actinomyces massiliensis 15 0
Corynebacterium pseudodiphtheriticum 15 0
Corynebacterium pseudogenitalium 15 0
Prevotella salivae 15 0
Streptococcus mutans 15 0
Streptococcus sp. HMSC034E03 15 0
Streptococcus sp. M334 15 0
Candida parapsilosis 14 0
Gemella asaccharolytica 14 0
Neisseria subflava 14 0
Streptococcus cristatus 14 0
Veillonella sp. T11011 6 14 0
Parvimonas sp. oral taxon 110 13 0
Prevotella jejuni 13 0
Streptococcus sp. HMSC067H01 13 0
Streptococcus sp. HMSC071D03 13 0
Candida dubliniensis 12 0
Mogibacterium timidum 12 0
Oribacterium sp. oral taxon 078 12 0
Bifidobacterium dentium 11 0
Streptococcus milleri 11 0
Actinomyces georgiae 10 0
Actinomyces hongkongensis 10 0
Cardiobacterium valvarum 10 0
Prevotella pallens 10 0
Stenotrophomonas rhizophila 10 0
Tannerella sp. oral taxon HOT 286 10 0
Atopobium deltae 9 0
Corynebacterium matruchotii 9 0
Lactobacillus rhamnosus 9 0
Oribacterium asaccharolyticum 9 0
Prevotella oris 9 0
Stenotrophomonas pavanii 9 0
Actinomyces sp. oral taxon 414 8 0
Actinomyces sp. oral taxon 448 8 0
Actinomyces sp. oral taxon 897 8 0
Eubacterium nodatum 8 0
Oribacterium parvum 8 0
Prevotella sp. oral taxon 306 8 0
Lactobacillus gasseri 7 0
Streptococcus sobrinus 7 0
Achromobacter ruhlandii 6 0
Candida orthopsilosis 6 0
Corynebacterium tuberculostearicum 6 0
Porphyromonas catoniae 6 0
Staphylococcus capitis 6 0
Staphylococcus haemolyticus 6 0
Alloprevotella rava 5 0
Cutibacterium avidum 5 0
Leptotrichia sp. oral taxon 215 5 0
Prevotella buccae 5 0
Streptococcus sp. oral taxon 056 5 0
Tannerella forsythia 5 0
Streptococcus thermophilus 4 2
Achromobacter denitrificans 4 0
Bifidobacterium breve 4 0
Capnocytophaga leadbetteri 4 0
Corynebacterium aurimucosum 4 0
Eubacterium saphenum 4 0
Leptotrichia sp. oral taxon 212 4 0
Peptostreptococcus sp. MV1 4 0
Streptococcus viridans 4 0
Actinomyces cardiffensis 3 0
Actinomyces radingae 3 0
Enterococcus avium 3 0
Neisseria sicca 3 0
Porphyromonas asaccharolytica 3 0
Scardovia inopinata 3 0
Streptococcus pyogenes 3 0
Streptococcus sp. SK643 3 0
Listeria floridensis 2 1
Malassezia globosa 2 1
Actinomyces denticolens 2 0
Anaerococcus octavius 2 0
Atopobium minutum 2 0
Corynebacterium afermentans 2 0
Corynebacterium kroppenstedtii 2 0
Lactobacillus paragasseri 2 0
Lactobacillus reuteri 2 0
Lactobacillus salivarius 2 0
Mycobacterium intracellulare 2 0
Neisseria elongata 2 0
Prevotella nigrescens 2 0
Prevotella sp. F0091 2 0
Streptococcus sp. NLAE zl C503 2 0
Actinomyces turicensis 1 0
Aspergillus eucalypticola 1 0
Aspergillus kawachii 1 0
Aspergillus lacticoffeatus 1 0
Aspergillus niger 1 0
Aspergillus phoenicis 1 0
Aspergillus sydowii 1 0
Aspergillus thermomutatus 1 0
Aspergillus tubingensis 1 0
Aspergillus turcosus 1 0
Aspergillus vadensis 1 0
Aspergillus welwitschiae 1 0
Campylobacter gracilis 1 0
Campylobacter mucosalis 1 0
Campylobacter showae 1 0
Candida tropicalis 1 0
Capnocytophaga granulosa 1 0
Capnocytophaga sp.utigena 1 0
Corynebacterium pyruviciproducens 1 0
Dialister micraerophilus 1 0
Eubacterium rectale 1 0
Fusobacterium periodonticum 1 0
Fusobacterium sp. oral taxon 370 1 0
Haemophilus sp. HMSC71H05 1 0
Klebsiella michiganensis 1 0
Klebsiella oxytoca 1 0
Lactobacillus oris 1 0
Leptotrichia buccalis 1 0
Leptotrichia hofstadii 1 0
Leptotrichia sp. oral taxon 225 1 0
Leptotrichia sp. oral taxon 498 1 0
Leptotrichia sp. oral taxon 879 1 0
Moraxella catarrhalis 1 0
Mycolicibacterium fortuitum 1 0
Neisseria macacae 1 0
Neisseria mucosa 1 0
Porphyromonas canoris 1 0
Porphyromonas uenonis 1 0
Prevotella denticola 1 0
Prevotella intermedia 1 0
Prevotella oulorum 1 0
Prevotella pleuritidis 1 0
Prevotella scopos 1 0
Rickettsia typhi 1 0
Selenomonas flueggei 1 0
Selenomonas noxia 1 0
Selenomonas sp. oral taxon 892 1 0
Selenomonas sp. oral taxon 920 1 0
Serratia liquefaciens 1 0
Streptococcus downei 1 0
Streptococcus massiliensis 1 0
Streptococcus salivarius CAG 79 1 0
Streptococcus sinensis 1 0
Streptococcus sp. DD11 1 0
Streptococcus sp. HMSC070B10 1 0
Veillonella tobetsuensis 1 0

3.6. Summary

Mean of species richness change by sample

phyloseq$phyloseq_count %>% 
        sample_data %>%
        data.frame() %>%
        group_by(subject_id) %>%
        summarise(subject_id = subject_id,
                  treatment = treatment,
                  S.obs = S.obs,
                  S.obs_untreated = S.obs)

Table 2. Sequencing summary

Table 2. Summary table of sequencing issues, significant effects linear mixed effect model (species richness ~ treatment + (1|subject)), changes in microbial beta diversity and significant effects of linear mixed effect model (functional richness ~ treatment + (1|subject)). Linear mixed effect models were stratified by sample type and employed data after prevalence and abundance filtering.

MolYsis for BAL, QIAamp for nasal swab, and HostZERO for sputum.

summary_host_ratio <- rbind(
        hr_lmer_bal %>%
        summary() %>%
        .$coefficients %>%
        data.frame(check.names = F) %>% 
        merge(., 
              hr_lmer_bal %>% confint() %>%
                        round(digits = 1) %>% 
                        format(nsmall = 1) %>%
                        as.data.frame() %>%
                        mutate(conf = paste("(",
                                            gsub(" ","", .[,1]),
                                            ", ",
                                            gsub(" ","", .[,2]),
                                            ")",
                                            sep = "")),
              by = 0
              ) %>%
         mutate(` ` = case_when(abs(`Pr(>|t|)`) < 0.001 ~ "***",
                               abs(`Pr(>|t|)`) < 0.01 ~ "**",
                               abs(`Pr(>|t|)`) < 0.05 ~ "*",
                               .default = " ")) %>% 
        mutate(`Row.names` = gsub("treatment|sample_type", "", `Row.names`)) %>%
         mutate(`Row.names` = gsub(":", " * ", `Row.names`)) %>%
         mutate("% Host" = 
                       paste(round(Estimate, digits = 1) %>%
                                     format(nsmall = 1),
                             " ",
                             conf, 
                             ` `,
                             sep = "")) %>%
                column_to_rownames("Row.names") %>%
        .[c("(Intercept)",
                  "lyPMA",
                  "Benzonase",
                  "HostZERO",
                  "MolYsis",
                  "QIAamp"),] %>%
         dplyr::select(c("% Host")),
        
        hr_lmer_ns %>%
        summary() %>%
        .$coefficients %>%
        data.frame(check.names = F) %>% 
        merge(., 
              hr_lmer_ns %>% confint() %>%
                        round(digits = 1) %>% 
                        format(nsmall = 1) %>%
                        as.data.frame() %>%
                        mutate(conf = paste("(",
                                            gsub(" ","", .[,1]),
                                            ", ",
                                            gsub(" ","", .[,2]),
                                            ")",
                                            sep = "")),
              by = 0
              ) %>%
         mutate(` ` = case_when(abs(`Pr(>|t|)`) < 0.001 ~ "***",
                               abs(`Pr(>|t|)`) < 0.01 ~ "**",
                               abs(`Pr(>|t|)`) < 0.05 ~ "*",
                               .default = " ")) %>% 
        mutate(`Row.names` = gsub("treatment|sample_type", "", `Row.names`)) %>%
         mutate(`Row.names` = gsub(":", " * ", `Row.names`)) %>%
         mutate("% Host" = 
                       paste(round(Estimate, digits = 1) %>%
                                     format(nsmall = 1),
                             " ",
                             conf, 
                             ` `,
                             sep = "")) %>%
                column_to_rownames("Row.names") %>%
        .[c("(Intercept)",
                  "lyPMA",
                  "Benzonase",
                  "HostZERO",
                  "MolYsis",
                  "QIAamp"),] %>%
         dplyr::select(c("% Host")) ,
        
 hr_lmer_spt %>%
        summary() %>%
        .$coefficients %>%
        data.frame(check.names = F) %>% 
        merge(., 
              hr_lmer_spt %>% confint() %>%
                        round(digits = 1) %>% 
                        format(nsmall = 1) %>%
                        as.data.frame() %>%
                        mutate(conf = paste("(",
                                            gsub(" ","", .[,1]),
                                            ", ",
                                            gsub(" ","", .[,2]),
                                            ")",
                                            sep = "")),
              by = 0
              ) %>%
         mutate(` ` = case_when(abs(`Pr(>|t|)`) < 0.001 ~ "***",
                               abs(`Pr(>|t|)`) < 0.01 ~ "**",
                               abs(`Pr(>|t|)`) < 0.05 ~ "*",
                               .default = " ")) %>% 
        mutate(`Row.names` = gsub("treatment|sample_type", "", `Row.names`)) %>%
         mutate(`Row.names` = gsub(":", " * ", `Row.names`)) %>%
         mutate("% Host" = 
                       paste(round(Estimate, digits = 1) %>%
                                     format(nsmall = 1),
                             " ",
                             conf, 
                             ` `,
                             sep = "")) %>%  
         column_to_rownames("Row.names") %>%
        .[c("(Intercept)",
                  "lyPMA",
                  "Benzonase",
                  "HostZERO",
                  "MolYsis",
                  "QIAamp"),] %>%
         dplyr::select(c("% Host")) 
)
        

summary_final_reads <- 
        rbind(
        fr_lmer_bal %>%
        summary() %>%
        .$coefficients %>%
        data.frame(check.names = F) %>% 
        merge(., 
              fr_lmer_bal %>% confint() %>%
                        round(digits = 1) %>% 
                        format(nsmall = 1) %>%
                        as.data.frame() %>%
                        mutate(conf = paste("(",
                                            gsub(" ","", .[,1]),
                                            ", ",
                                            gsub(" ","", .[,2]),
                                            ")",
                                            sep = "")),
              by = 0
              ) %>%
         mutate(` ` = case_when(abs(`Pr(>|t|)`) < 0.001 ~ "***",
                               abs(`Pr(>|t|)`) < 0.01 ~ "**",
                               abs(`Pr(>|t|)`) < 0.05 ~ "*",
                               .default = " ")) %>% 
        mutate(`Row.names` = gsub("treatment|sample_type", "", `Row.names`)) %>%
         mutate(`Row.names` = gsub(":", " * ", `Row.names`)) %>%
         mutate("log<sub>10</sub>(Final reads)" = 
                       paste(round(Estimate, digits = 1) %>%
                                     format(nsmall = 1),
                             " ",
                             conf, 
                             ` `,
                             sep = "")) %>%
                column_to_rownames("Row.names") %>%
        .[c("(Intercept)",
                  "lyPMA",
                  "Benzonase",
                  "HostZERO",
                  "MolYsis",
                  "QIAamp"),] %>%
         dplyr::select(c("log<sub>10</sub>(Final reads)")),
        
        fr_lmer_ns %>%
        summary() %>%
        .$coefficients %>%
        data.frame(check.names = F) %>% 
        merge(., 
              fr_lmer_ns %>% confint() %>%
                        round(digits = 1) %>% 
                        format(nsmall = 1) %>%
                        as.data.frame() %>%
                        mutate(conf = paste("(",
                                            gsub(" ","", .[,1]),
                                            ", ",
                                            gsub(" ","", .[,2]),
                                            ")",
                                            sep = "")),
              by = 0
              ) %>%
         mutate(` ` = case_when(abs(`Pr(>|t|)`) < 0.001 ~ "***",
                               abs(`Pr(>|t|)`) < 0.01 ~ "**",
                               abs(`Pr(>|t|)`) < 0.05 ~ "*",
                               .default = " ")) %>% 
        mutate(`Row.names` = gsub("treatment|sample_type", "", `Row.names`)) %>%
         mutate(`Row.names` = gsub(":", " * ", `Row.names`)) %>%
         mutate("log<sub>10</sub>(Final reads)" = 
                       paste(round(Estimate, digits = 1) %>%
                                     format(nsmall = 1),
                             " ",
                             conf, 
                             ` `,
                             sep = "")) %>%
                column_to_rownames("Row.names") %>%
        .[c("(Intercept)",
                  "lyPMA",
                  "Benzonase",
                  "HostZERO",
                  "MolYsis",
                  "QIAamp"),] %>%
         dplyr::select(c("log<sub>10</sub>(Final reads)")) ,
        
 fr_lmer_spt %>%
        summary() %>%
        .$coefficients %>%
        data.frame(check.names = F) %>% 
        merge(., 
              fr_lmer_spt %>% confint() %>%
                        round(digits = 1) %>% 
                        format(nsmall = 1) %>%
                        as.data.frame() %>%
                        mutate(conf = paste("(",
                                            gsub(" ","", .[,1]),
                                            ", ",
                                            gsub(" ","", .[,2]),
                                            ")",
                                            sep = "")),
              by = 0
              ) %>%
         mutate(` ` = case_when(abs(`Pr(>|t|)`) < 0.001 ~ "***",
                               abs(`Pr(>|t|)`) < 0.01 ~ "**",
                               abs(`Pr(>|t|)`) < 0.05 ~ "*",
                               .default = " ")) %>% 
        mutate(`Row.names` = gsub("treatment|sample_type", "", `Row.names`)) %>%
         mutate(`Row.names` = gsub(":", " * ", `Row.names`)) %>%
         mutate("log<sub>10</sub>(Final reads)" = 
                       paste(round(Estimate, digits = 1) %>%
                                     format(nsmall = 1),
                             " ",
                             conf, 
                             ` `,
                             sep = "")) %>%  
         column_to_rownames("Row.names") %>%
        .[c("(Intercept)",
                  "lyPMA",
                  "Benzonase",
                  "HostZERO",
                  "MolYsis",
                  "QIAamp"),] %>%
         dplyr::select(c("log<sub>10</sub>(Final reads)")) 
)
        

summary_species_richness <-
        rbind(
        sr_lmer_bal %>%
        summary() %>%
        .$coefficients %>%
        data.frame(check.names = F) %>% 
        merge(., 
              sr_lmer_bal %>% confint() %>%
                        round(digits = 1) %>% 
                        format(nsmall = 1) %>%
                        as.data.frame() %>%
                        mutate(conf = paste("(",
                                            gsub(" ","", .[,1]),
                                            ", ",
                                            gsub(" ","", .[,2]),
                                            ")",
                                            sep = "")),
              by = 0
              ) %>%
         mutate(` ` = case_when(abs(`Pr(>|t|)`) < 0.001 ~ "***",
                               abs(`Pr(>|t|)`) < 0.01 ~ "**",
                               abs(`Pr(>|t|)`) < 0.05 ~ "*",
                               .default = " ")) %>% 
        mutate(`Row.names` = gsub("treatment|sample_type", "", `Row.names`)) %>%
         mutate(`Row.names` = gsub(":", " * ", `Row.names`)) %>%
         mutate("Species richness" = 
                       paste(round(Estimate, digits = 1) %>%
                                     format(nsmall = 1),
                             " ",
                             conf, 
                             ` `,
                             sep = "")) %>%
                column_to_rownames("Row.names") %>%
        .[c("(Intercept)",
                  "lyPMA",
                  "Benzonase",
                  "HostZERO",
                  "MolYsis",
                  "QIAamp"),] %>%
         dplyr::select(c("Species richness")),
        
        sr_lmer_ns %>%
        summary() %>%
        .$coefficients %>%
        data.frame(check.names = F) %>% 
        merge(., 
              sr_lmer_ns %>% confint() %>%
                        round(digits = 1) %>% 
                        format(nsmall = 1) %>%
                        as.data.frame() %>%
                        mutate(conf = paste("(",
                                            gsub(" ","", .[,1]),
                                            ", ",
                                            gsub(" ","", .[,2]),
                                            ")",
                                            sep = "")),
              by = 0
              ) %>%
         mutate(` ` = case_when(abs(`Pr(>|t|)`) < 0.001 ~ "***",
                               abs(`Pr(>|t|)`) < 0.01 ~ "**",
                               abs(`Pr(>|t|)`) < 0.05 ~ "*",
                               .default = " ")) %>% 
        mutate(`Row.names` = gsub("treatment|sample_type", "", `Row.names`)) %>%
         mutate(`Row.names` = gsub(":", " * ", `Row.names`)) %>%
         mutate("Species richness" = 
                       paste(round(Estimate, digits = 1) %>%
                                     format(nsmall = 1),
                             " ",
                             conf, 
                             ` `,
                             sep = "")) %>%
                column_to_rownames("Row.names") %>%
        .[c("(Intercept)",
                  "lyPMA",
                  "Benzonase",
                  "HostZERO",
                  "MolYsis",
                  "QIAamp"),] %>%
         dplyr::select(c("Species richness")) ,
        
 sr_lmer_spt %>%
        summary() %>%
        .$coefficients %>%
        data.frame(check.names = F) %>% 
        merge(., 
              sr_lmer_spt %>% confint() %>%
                        round(digits = 1) %>% 
                        format(nsmall = 1) %>%
                        as.data.frame() %>%
                        mutate(conf = paste("(",
                                            gsub(" ","", .[,1]),
                                            ", ",
                                            gsub(" ","", .[,2]),
                                            ")",
                                            sep = "")),
              by = 0
              ) %>%
         mutate(` ` = case_when(abs(`Pr(>|t|)`) < 0.001 ~ "***",
                               abs(`Pr(>|t|)`) < 0.01 ~ "**",
                               abs(`Pr(>|t|)`) < 0.05 ~ "*",
                               .default = " ")) %>% 
        mutate(`Row.names` = gsub("treatment|sample_type", "", `Row.names`)) %>%
         mutate(`Row.names` = gsub(":", " * ", `Row.names`)) %>%
         mutate("Species richness" = 
                       paste(round(Estimate, digits = 1) %>%
                                     format(nsmall = 1),
                             " ",
                             conf, 
                             ` `,
                             sep = "")) %>%  
         column_to_rownames("Row.names") %>%
        .[c("(Intercept)",
                  "lyPMA",
                  "Benzonase",
                  "HostZERO",
                  "MolYsis",
                  "QIAamp"),] %>%
         dplyr::select(c("Species richness")) 
) %>% 
        rename("Microbial species richness" = "Species richness")
        

summary_function_richness <-
        rbind(
        pfr_lmer_bal %>%
        summary() %>%
        .$coefficients %>%
        data.frame(check.names = F) %>% 
        merge(., 
              pfr_lmer_bal %>% confint() %>%
                        round(digits = 1) %>% 
                        format(nsmall = 1) %>%
                        as.data.frame() %>%
                        mutate(conf = paste("(",
                                            gsub(" ","", .[,1]),
                                            ", ",
                                            gsub(" ","", .[,2]),
                                            ")",
                                            sep = "")),
              by = 0
              ) %>%
         mutate(` ` = case_when(abs(`Pr(>|t|)`) < 0.001 ~ "***",
                               abs(`Pr(>|t|)`) < 0.01 ~ "**",
                               abs(`Pr(>|t|)`) < 0.05 ~ "*",
                               .default = " ")) %>% 
        mutate(`Row.names` = gsub("treatment|sample_type", "", `Row.names`)) %>%
         mutate(`Row.names` = gsub(":", " * ", `Row.names`)) %>%
         mutate("Functional richness" = 
                       paste(round(Estimate, digits = 1) %>%
                                     format(nsmall = 1),
                             " ",
                             conf, 
                             ` `,
                             sep = "")) %>%
                column_to_rownames("Row.names") %>%
        .[c("(Intercept)",
                  "lyPMA",
                  "Benzonase",
                  "HostZERO",
                  "MolYsis",
                  "QIAamp"),] %>%
         dplyr::select(c("Functional richness")),
        
        pfr_lmer_ns %>%
        summary() %>%
        .$coefficients %>%
        data.frame(check.names = F) %>% 
        merge(., 
              pfr_lmer_ns %>% confint() %>%
                        round(digits = 1) %>% 
                        format(nsmall = 1) %>%
                        as.data.frame() %>%
                        mutate(conf = paste("(",
                                            gsub(" ","", .[,1]),
                                            ", ",
                                            gsub(" ","", .[,2]),
                                            ")",
                                            sep = "")),
              by = 0
              ) %>%
         mutate(` ` = case_when(abs(`Pr(>|t|)`) < 0.001 ~ "***",
                               abs(`Pr(>|t|)`) < 0.01 ~ "**",
                               abs(`Pr(>|t|)`) < 0.05 ~ "*",
                               .default = " ")) %>% 
        mutate(`Row.names` = gsub("treatment|sample_type", "", `Row.names`)) %>%
         mutate(`Row.names` = gsub(":", " * ", `Row.names`)) %>%
         mutate("Functional richness" = 
                       paste(round(Estimate, digits = 1) %>%
                                     format(nsmall = 1),
                             " ",
                             conf, 
                             ` `,
                             sep = "")) %>%
                column_to_rownames("Row.names") %>%
        .[c("(Intercept)",
                  "lyPMA",
                  "Benzonase",
                  "HostZERO",
                  "MolYsis",
                  "QIAamp"),] %>%
         dplyr::select(c("Functional richness")) ,
        
 pfr_lmer_spt %>%
        summary() %>%
        .$coefficients %>%
        data.frame(check.names = F) %>% 
        merge(., 
              pfr_lmer_spt %>% confint() %>%
                        round(digits = 1) %>% 
                        format(nsmall = 1) %>%
                        as.data.frame() %>%
                        mutate(conf = paste("(",
                                            gsub(" ","", .[,1]),
                                            ", ",
                                            gsub(" ","", .[,2]),
                                            ")",
                                            sep = "")),
              by = 0
              ) %>%
         mutate(` ` = case_when(abs(`Pr(>|t|)`) < 0.001 ~ "***",
                               abs(`Pr(>|t|)`) < 0.01 ~ "**",
                               abs(`Pr(>|t|)`) < 0.05 ~ "*",
                               .default = " ")) %>% 
        mutate(`Row.names` = gsub("treatment|sample_type", "", `Row.names`)) %>%
         mutate(`Row.names` = gsub(":", " * ", `Row.names`)) %>%
         mutate("Functional richness" = 
                       paste(round(Estimate, digits = 1) %>%
                                     format(nsmall = 1),
                             " ",
                             conf, 
                             ` `,
                             sep = "")) %>%  
         column_to_rownames("Row.names") %>%
        .[c("(Intercept)",
                  "lyPMA",
                  "Benzonase",
                  "HostZERO",
                  "MolYsis",
                  "QIAamp"),] %>%
         dplyr::select(c("Functional richness")) 
)
        

summary_viral_richness <-
        rbind(
        sr_lmer_bal_v %>%
        summary() %>%
        .$coefficients %>%
        data.frame(check.names = F) %>% 
        merge(., 
              sr_lmer_bal %>% confint() %>%
                        round(digits = 1) %>% 
                        format(nsmall = 1) %>%
                        as.data.frame() %>%
                        mutate(conf = paste("(",
                                            gsub(" ","", .[,1]),
                                            ", ",
                                            gsub(" ","", .[,2]),
                                            ")",
                                            sep = "")),
              by = 0
              ) %>%
         mutate(` ` = case_when(abs(`Pr(>|t|)`) < 0.001 ~ "***",
                               abs(`Pr(>|t|)`) < 0.01 ~ "**",
                               abs(`Pr(>|t|)`) < 0.05 ~ "*",
                               .default = " ")) %>% 
        mutate(`Row.names` = gsub("treatment|sample_type", "", `Row.names`)) %>%
         mutate(`Row.names` = gsub(":", " * ", `Row.names`)) %>%
         mutate("Species richness" = 
                       paste(round(Estimate, digits = 1) %>%
                                     format(nsmall = 1),
                             " ",
                             conf, 
                             ` `,
                             sep = "")) %>%
                column_to_rownames("Row.names") %>%
        .[c("(Intercept)",
                  "lyPMA",
                  "Benzonase",
                  "HostZERO",
                  "MolYsis",
                  "QIAamp"),] %>%
         dplyr::select(c("Species richness")),
        
        sr_lmer_ns_v %>%
        summary() %>%
        .$coefficients %>%
        data.frame(check.names = F) %>% 
        merge(., 
              sr_lmer_ns %>% confint() %>%
                        round(digits = 1) %>% 
                        format(nsmall = 1) %>%
                        as.data.frame() %>%
                        mutate(conf = paste("(",
                                            gsub(" ","", .[,1]),
                                            ", ",
                                            gsub(" ","", .[,2]),
                                            ")",
                                            sep = "")),
              by = 0
              ) %>%
         mutate(` ` = case_when(abs(`Pr(>|t|)`) < 0.001 ~ "***",
                               abs(`Pr(>|t|)`) < 0.01 ~ "**",
                               abs(`Pr(>|t|)`) < 0.05 ~ "*",
                               .default = " ")) %>% 
        mutate(`Row.names` = gsub("treatment|sample_type", "", `Row.names`)) %>%
         mutate(`Row.names` = gsub(":", " * ", `Row.names`)) %>%
         mutate("Species richness" = 
                       paste(round(Estimate, digits = 1) %>%
                                     format(nsmall = 1),
                             " ",
                             conf, 
                             ` `,
                             sep = "")) %>%
                column_to_rownames("Row.names") %>%
        .[c("(Intercept)",
                  "lyPMA",
                  "Benzonase",
                  "HostZERO",
                  "MolYsis",
                  "QIAamp"),] %>%
         dplyr::select(c("Species richness")) ,
        
 sr_lmer_spt_v %>%
        summary() %>%
        .$coefficients %>%
        data.frame(check.names = F) %>% 
        merge(., 
              sr_lmer_spt %>% confint() %>%
                        round(digits = 1) %>% 
                        format(nsmall = 1) %>%
                        as.data.frame() %>%
                        mutate(conf = paste("(",
                                            gsub(" ","", .[,1]),
                                            ", ",
                                            gsub(" ","", .[,2]),
                                            ")",
                                            sep = "")),
              by = 0
              ) %>%
         mutate(` ` = case_when(abs(`Pr(>|t|)`) < 0.001 ~ "***",
                               abs(`Pr(>|t|)`) < 0.01 ~ "**",
                               abs(`Pr(>|t|)`) < 0.05 ~ "*",
                               .default = " ")) %>% 
        mutate(`Row.names` = gsub("treatment|sample_type", "", `Row.names`)) %>%
         mutate(`Row.names` = gsub(":", " * ", `Row.names`)) %>%
         mutate("Species richness" = 
                       paste(round(Estimate, digits = 1) %>%
                                     format(nsmall = 1),
                             " ",
                             conf, 
                             ` `,
                             sep = "")) %>%  
         column_to_rownames("Row.names") %>%
        .[c("(Intercept)",
                  "lyPMA",
                  "Benzonase",
                  "HostZERO",
                  "MolYsis",
                  "QIAamp"),] %>%
         dplyr::select(c("Species richness")) 
) %>%
        rename("Viral species richness" = "Species richness")
        

summary_bias <-
        rbind(
        
 mh_lmer_bal %>%
        summary() %>%
        .$coefficients %>%
        data.frame(check.names = F) %>% 
        merge(., 
              mh_lmer_bal %>% confint() %>%
                        round(digits = 1) %>% 
                        format(nsmall = 1) %>%
                        as.data.frame() %>%
                        mutate(conf = paste("(",
                                            gsub(" ","", .[,1]),
                                            ", ",
                                            gsub(" ","", .[,2]),
                                            ")",
                                            sep = "")),
              by = 0
              ) %>%
         mutate(` ` = case_when(abs(`Pr(>|t|)`) < 0.001 ~ "***",
                               abs(`Pr(>|t|)`) < 0.01 ~ "**",
                               abs(`Pr(>|t|)`) < 0.05 ~ "*",
                               .default = " ")) %>% 
        mutate(`Row.names` = gsub("treatment|sample_type", "", `Row.names`)) %>%
         mutate(`Row.names` = gsub(":", " * ", `Row.names`)) %>%
         mutate("Bias (Morisita-Horn)" = 
                       paste(round(Estimate, digits = 1) %>%
                                     format(nsmall = 1),
                             " ",
                             conf, 
                             ` `,
                             sep = ""),
                "Row.names" = case_when(`Row.names` == "benzonase" ~ "Benzonase",
                                      `Row.names` == "host_zero" ~ "HostZERO",
                                      `Row.names` == "lypma" ~ "lyPMA",
                                      `Row.names` == "molysis" ~ "MolYsis",
                                      `Row.names` == "qiaamp" ~ "QIAamp",
                                      .default = `Row.names`)
                ) %>%
                column_to_rownames("Row.names") %>%
        .[c("(Intercept)",
                  "lyPMA",
                  "Benzonase",
                  "HostZERO",
                  "MolYsis",
                  "QIAamp"),] %>%
         dplyr::select(c("Bias (Morisita-Horn)")),
        
        
 mh_lmer_ns %>%
        summary() %>%
        .$coefficients %>%
        data.frame(check.names = F) %>% 
        merge(., 
              mh_lmer_ns %>% confint() %>%
                        round(digits = 1) %>% 
                        format(nsmall = 1) %>%
                        as.data.frame() %>%
                        mutate(conf = paste("(",
                                            gsub(" ","", .[,1]),
                                            ", ",
                                            gsub(" ","", .[,2]),
                                            ")",
                                            sep = "")),
              by = 0
              ) %>%
         mutate(` ` = case_when(abs(`Pr(>|t|)`) < 0.001 ~ "***",
                               abs(`Pr(>|t|)`) < 0.01 ~ "**",
                               abs(`Pr(>|t|)`) < 0.05 ~ "*",
                               .default = " ")) %>% 
        mutate(`Row.names` = gsub("treatment|sample_type", "", `Row.names`)) %>%
         mutate(`Row.names` = gsub(":", " * ", `Row.names`)) %>%
         mutate("Bias (Morisita-Horn)" = 
                       paste(round(Estimate, digits = 1) %>%
                                     format(nsmall = 1),
                             " ",
                             conf, 
                             ` `,
                             sep = ""),
                "Row.names" = case_when(`Row.names` == "benzonase" ~ "Benzonase",
                                      `Row.names` == "host_zero" ~ "HostZERO",
                                      `Row.names` == "lypma" ~ "lyPMA",
                                      `Row.names` == "molysis" ~ "MolYsis",
                                      `Row.names` == "qiaamp" ~ "QIAamp",
                                      .default = `Row.names`)
                ) %>%
                column_to_rownames("Row.names") %>%
        .[c("(Intercept)",
                  "lyPMA",
                  "Benzonase",
                  "HostZERO",
                  "MolYsis",
                  "QIAamp"),] %>%
         dplyr::select(c("Bias (Morisita-Horn)")),
        
 
 mh_lmer_spt %>%
        summary() %>%
        .$coefficients %>%
        data.frame(check.names = F) %>% 
        merge(., 
              mh_lmer_spt %>% confint() %>%
                        round(digits = 1) %>% 
                        format(nsmall = 1) %>%
                        as.data.frame() %>%
                        mutate(conf = paste("(",
                                            gsub(" ","", .[,1]),
                                            ", ",
                                            gsub(" ","", .[,2]),
                                            ")",
                                            sep = "")),
              by = 0
              ) %>%
         mutate(` ` = case_when(abs(`Pr(>|t|)`) < 0.001 ~ "***",
                               abs(`Pr(>|t|)`) < 0.01 ~ "**",
                               abs(`Pr(>|t|)`) < 0.05 ~ "*",
                               .default = " ")) %>% 
        mutate(`Row.names` = gsub("treatment|sample_type", "", `Row.names`)) %>%
         mutate(`Row.names` = gsub(":", " * ", `Row.names`)) %>%
         mutate("Bias (Morisita-Horn)" = 
                       paste(round(Estimate, digits = 1) %>%
                                     format(nsmall = 1),
                             " ",
                             conf, 
                             ` `,
                             sep = ""),
                "Row.names" = case_when(`Row.names` == "benzonase" ~ "Benzonase",
                                      `Row.names` == "host_zero" ~ "HostZERO",
                                      `Row.names` == "lypma" ~ "lyPMA",
                                      `Row.names` == "molysis" ~ "MolYsis",
                                      `Row.names` == "qiaamp" ~ "QIAamp",
                                      .default = `Row.names`)
                ) %>%
                column_to_rownames("Row.names") %>%
        .[c("(Intercept)",
                  "lyPMA",
                  "Benzonase",
                  "HostZERO",
                  "MolYsis",
                  "QIAamp"),] %>%
         dplyr::select(c("Bias (Morisita-Horn)"))
 
)  %>%
        rename("Microbial bias (Morisita-Horn)" = "Bias (Morisita-Horn)")
        
summary_gram_negative <-
        rbind(
        gram_neg_prop_bal %>%
        summary() %>%
        .$coefficients %>%
        data.frame(check.names = F) %>% 
        merge(., 
              gram_neg_prop_bal %>% confint() %>%
                        round(digits = 1) %>% 
                        format(nsmall = 1) %>%
                        as.data.frame() %>%
                        mutate(conf = paste("(",
                                            gsub(" ","", .[,1]),
                                            ", ",
                                            gsub(" ","", .[,2]),
                                            ")",
                                            sep = "")),
              by = 0
              ) %>%
         mutate(` ` = case_when(abs(`Pr(>|t|)`) < 0.001 ~ "***",
                               abs(`Pr(>|t|)`) < 0.01 ~ "**",
                               abs(`Pr(>|t|)`) < 0.05 ~ "*",
                               .default = " ")) %>% 
        mutate(`Row.names` = gsub("treatment|sample_type", "", `Row.names`)) %>%
         mutate(`Row.names` = gsub(":", " * ", `Row.names`)) %>%
         mutate("% gram negative" = 
                       paste(round(Estimate, digits = 1) %>%
                                     format(nsmall = 1),
                             " ",
                             conf, 
                             ` `,
                             sep = "")) %>%
                column_to_rownames("Row.names") %>%
        .[c("(Intercept)",
                  "lyPMA",
                  "Benzonase",
                  "HostZERO",
                  "MolYsis",
                  "QIAamp"),] %>%
         dplyr::select(c("% gram negative")),
        
        gram_neg_prop_ns %>%
        summary() %>%
        .$coefficients %>%
        data.frame(check.names = F) %>% 
        merge(., 
              gram_neg_prop_ns %>% confint() %>%
                        round(digits = 1) %>% 
                        format(nsmall = 1) %>%
                        as.data.frame() %>%
                        mutate(conf = paste("(",
                                            gsub(" ","", .[,1]),
                                            ", ",
                                            gsub(" ","", .[,2]),
                                            ")",
                                            sep = "")),
              by = 0
              ) %>%
         mutate(` ` = case_when(abs(`Pr(>|t|)`) < 0.001 ~ "***",
                               abs(`Pr(>|t|)`) < 0.01 ~ "**",
                               abs(`Pr(>|t|)`) < 0.05 ~ "*",
                               .default = " ")) %>% 
        mutate(`Row.names` = gsub("treatment|sample_type", "", `Row.names`)) %>%
         mutate(`Row.names` = gsub(":", " * ", `Row.names`)) %>%
         mutate("% gram negative" = 
                       paste(round(Estimate, digits = 1) %>%
                                     format(nsmall = 1),
                             " ",
                             conf, 
                             ` `,
                             sep = "")) %>%
                column_to_rownames("Row.names") %>%
        .[c("(Intercept)",
                  "lyPMA",
                  "Benzonase",
                  "HostZERO",
                  "MolYsis",
                  "QIAamp"),] %>%
         dplyr::select(c("% gram negative")) ,
        
 gram_neg_prop_spt %>%
        summary() %>%
        .$coefficients %>%
        data.frame(check.names = F) %>% 
        merge(., 
              gram_neg_prop_spt %>% confint() %>%
                        round(digits = 1) %>% 
                        format(nsmall = 1) %>%
                        as.data.frame() %>%
                        mutate(conf = paste("(",
                                            gsub(" ","", .[,1]),
                                            ", ",
                                            gsub(" ","", .[,2]),
                                            ")",
                                            sep = "")),
              by = 0
              ) %>%
         mutate(` ` = case_when(abs(`Pr(>|t|)`) < 0.001 ~ "***",
                               abs(`Pr(>|t|)`) < 0.01 ~ "**",
                               abs(`Pr(>|t|)`) < 0.05 ~ "*",
                               .default = " ")) %>% 
        mutate(`Row.names` = gsub("treatment|sample_type", "", `Row.names`)) %>%
         mutate(`Row.names` = gsub(":", " * ", `Row.names`)) %>%
         mutate("% gram negative" = 
                       paste(round(Estimate, digits = 1) %>%
                                     format(nsmall = 1),
                             " ",
                             conf, 
                             ` `,
                             sep = "")) %>%  
         column_to_rownames("Row.names") %>%
        .[c("(Intercept)",
                  "lyPMA",
                  "Benzonase",
                  "HostZERO",
                  "MolYsis",
                  "QIAamp"),] %>%
         dplyr::select(c("% gram negative")) 
) 
        


summary_bias_viral <-
        rbind(     
                
mh_lmer_bal_kbl_v <-  
       mh_lmer_bal %>%
        summary() %>%
        .$coefficients %>%
        data.frame(check.names = F) %>% 
        merge(., 
              mh_lmer_bal %>% confint() %>%
                        round(digits = 1) %>% 
                        format(nsmall = 1) %>%
                        as.data.frame() %>%
                        mutate(conf = paste("(",
                                            gsub(" ","", .[,1]),
                                            ", ",
                                            gsub(" ","", .[,2]),
                                            ")",
                                            sep = "")),
              by = 0
              ) %>%
         mutate(` ` = case_when(abs(`Pr(>|t|)`) < 0.001 ~ "***",
                               abs(`Pr(>|t|)`) < 0.01 ~ "**",
                               abs(`Pr(>|t|)`) < 0.05 ~ "*",
                               .default = " ")) %>% 
        mutate(`Row.names` = gsub("treatment|sample_type", "", `Row.names`)) %>%
         mutate(`Row.names` = gsub(":", " * ", `Row.names`)) %>%
         mutate("Bias (Morisita-Horn)" = 
                       paste(round(Estimate, digits = 1) %>%
                                     format(nsmall = 1),
                             " ",
                             conf, 
                             ` `,
                             sep = ""),
                "Row.names" = case_when(`Row.names` == "benzonase" ~ "Benzonase",
                                      `Row.names` == "host_zero" ~ "HostZERO",
                                      `Row.names` == "lypma" ~ "lyPMA",
                                      `Row.names` == "molysis" ~ "MolYsis",
                                      `Row.names` == "qiaamp" ~ "QIAamp",
                                      .default = `Row.names`)
                ) %>%
                column_to_rownames("Row.names") %>%
          mutate("Bias (Morisita-Horn)" = "-") %>%
        .[c("(Intercept)",
                  "lyPMA",
                  "Benzonase",
                  "HostZERO",
                  "MolYsis",
                  "QIAamp"),] %>%
         dplyr::select(c("Bias (Morisita-Horn)")),
       
        
 mh_lmer_ns_v %>%
        summary() %>%
        .$coefficients %>%
        data.frame(check.names = F) %>% 
        merge(., 
              mh_lmer_ns %>% confint() %>%
                        round(digits = 1) %>% 
                        format(nsmall = 1) %>%
                        as.data.frame() %>%
                        mutate(conf = paste("(",
                                            gsub(" ","", .[,1]),
                                            ", ",
                                            gsub(" ","", .[,2]),
                                            ")",
                                            sep = "")),
              by = 0
              ) %>%
         mutate(` ` = case_when(abs(`Pr(>|t|)`) < 0.001 ~ "***",
                               abs(`Pr(>|t|)`) < 0.01 ~ "**",
                               abs(`Pr(>|t|)`) < 0.05 ~ "*",
                               .default = " ")) %>% 
        mutate(`Row.names` = gsub("treatment|sample_type", "", `Row.names`)) %>%
         mutate(`Row.names` = gsub(":", " * ", `Row.names`)) %>%
         mutate("Bias (Morisita-Horn)" = 
                       paste(round(Estimate, digits = 1) %>%
                                     format(nsmall = 1),
                             " ",
                             conf, 
                             ` `,
                             sep = ""),
                "Row.names" = case_when(`Row.names` == "benzonase" ~ "Benzonase",
                                      `Row.names` == "host_zero" ~ "HostZERO",
                                      `Row.names` == "lypma" ~ "lyPMA",
                                      `Row.names` == "molysis" ~ "MolYsis",
                                      `Row.names` == "qiaamp" ~ "QIAamp",
                                      .default = `Row.names`)
                ) %>%
                column_to_rownames("Row.names") %>%
        .[c("(Intercept)",
                  "lyPMA",
                  "Benzonase",
                  "HostZERO",
                  "MolYsis",
                  "QIAamp"),] %>%
         dplyr::select(c("Bias (Morisita-Horn)")),
        
 
 mh_lmer_spt_v %>%
        summary() %>%
        .$coefficients %>%
        data.frame(check.names = F) %>% 
        merge(., 
              mh_lmer_spt %>% confint() %>%
                        round(digits = 1) %>% 
                        format(nsmall = 1) %>%
                        as.data.frame() %>%
                        mutate(conf = paste("(",
                                            gsub(" ","", .[,1]),
                                            ", ",
                                            gsub(" ","", .[,2]),
                                            ")",
                                            sep = "")),
              by = 0
              ) %>%
         mutate(` ` = case_when(abs(`Pr(>|t|)`) < 0.001 ~ "***",
                               abs(`Pr(>|t|)`) < 0.01 ~ "**",
                               abs(`Pr(>|t|)`) < 0.05 ~ "*",
                               .default = " ")) %>% 
        mutate(`Row.names` = gsub("treatment|sample_type", "", `Row.names`)) %>%
         mutate(`Row.names` = gsub(":", " * ", `Row.names`)) %>%
         mutate("Bias (Morisita-Horn)" = 
                       paste(round(Estimate, digits = 1) %>%
                                     format(nsmall = 1),
                             " ",
                             conf, 
                             ` `,
                             sep = ""),
                "Row.names" = case_when(`Row.names` == "benzonase" ~ "Benzonase",
                                      `Row.names` == "host_zero" ~ "HostZERO",
                                      `Row.names` == "lypma" ~ "lyPMA",
                                      `Row.names` == "molysis" ~ "MolYsis",
                                      `Row.names` == "qiaamp" ~ "QIAamp",
                                      .default = `Row.names`)
                ) %>%
                column_to_rownames("Row.names") %>%
        .[c("(Intercept)",
                  "lyPMA",
                  "Benzonase",
                  "HostZERO",
                  "MolYsis",
                  "QIAamp"),] %>%
         dplyr::select(c("Bias (Morisita-Horn)"))
) %>% rename("Viral bias (Morisita-Horn)" = "Bias (Morisita-Horn)")
        

#summary_comment <- 
#        c("", "Low efficiency", "Low efficiency", 
#          "No incrased richness", "Optimal", "Low efficiency",
#          
#          "","No incrased richness", "Low efficiency",
#          "High rate of library prep failure", "High bias", "Optimal",
#          
#          "", "Low efficiency", "Low efficiency",
#          "Optimal<sup>1</sup>", "High bias<sup>2</sup>", "High bias")



table3 <- cbind(`Sample type` = c("", "BAL", "", "", "", "",
        "", "Nasal swabs", "", "", "", "",
        "", "Sputum", "", "", "", ""),
      `Treatment` = c("", "lyPMA", "Benzonase", "HostZERO", "MolYsis", "QIAamp", 
        "", "lyPMA", "Benzonase", "HostZERO", "MolYsis", "QIAamp", 
        "", "lyPMA", "Benzonase", "HostZERO", "MolYsis", "QIAamp"),
      summary_host_ratio,
      summary_final_reads,
      summary_species_richness,
      summary_function_richness,
      summary_viral_richness,
      summary_bias,
      summary_bias_viral,
      summary_gram_negative
      #Comment = summary_comment
      ) %>% 
        subset(., .$Treatment != "") %>%
        remove_rownames() %>%
        kbl(format = "html", escape = 0) %>%
        add_footnote(c("Unable to run statistical tests"), notation = "alphabet") %>%
        kable_styling(full_width = 0, html_font = "sans")

save_kable(table3, file = "Project_SICAS2_microbiome/7_Manuscripts/2022_MGK_Host_Depletion/Figures/ta_updated.html", self_contained = T)

table3
Sample type Treatment % Host log10(Final reads) Microbial species richness Functional richness Viral species richness Microbial bias (Morisita-Horn) Viral bias (Morisita-Horn) % gram negative
BAL lyPMA -3.1 (-15.7, 9.5) 0.4 (-0.2, 0.9) 1.7 (-10.7, 14.2) 63.0 (-48.7, 174.7) 0.4 (-10.7, 14.2) 0.3 (0.1, 0.6)*
6.4 (-21.2, 34.1)
Benzonase -1.1 (-13.8, 11.5) 0.8 (0.3, 1.3)* 5.7 (-6.2, 17.6) 137.2 (25.5, 248.9)* 5.6 (-6.2, 17.6) 0.1 (-0.1, 0.3)
15.3 (-12.3, 43.0)
HostZERO -18.3 (-30.9, -5.6)* 1.0 (0.4, 1.5)** 8.9 (-3.0, 20.8) 177.8 (66.1, 289.5)** 6.6 (-3.0, 20.8) 0.3 (0.0, 0.5)
8.6 (-19.0, 36.3)
MolYsis -17.7 (-30.3, -5.1)* 1.0 (0.5, 1.6)** 18.9 (7.0, 30.8)* 203.2 (91.5, 314.9)*** 15.0 (7.0, 30.8)* 0.2 (0.0, 0.4)
3.4 (-24.3, 31.0)
QIAamp -6.3 (-18.9, 6.3) 1.0 (0.5, 1.6)** 9.5 (-2.4, 21.4) 139.4 (27.7, 251.1)* 10.6 (-2.4, 21.4) 0.3 (0.0, 0.5)
7.3 (-20.4, 35.0)
Nasal swabs lyPMA -27.7 (-49.0, -6.3)* -0.5 (-1.0, -0.1)* -4.8 (-9.2, -0.4) 7.8 (-26.7, 42.3) -6.4 (-9.2, -0.4)* 0.2 (0.1, 0.3)* 0.5 (0.1, 0.3)*** 19.4 (14.1, 24.6)***
Benzonase -20.0 (-41.4, 1.5) 0.1 (-0.3, 0.6) -0.4 (-4.8, 4.1) 23.4 (-11.1, 57.9) -3.9 (-4.8, 4.1) 0.2 (0.0, 0.3)* 0.1 (0.0, 0.3) 1.9 (-3.4, 7.5)
HostZERO -73.6 (-94.9, -52.1)*** 0.9 (0.4, 1.3)** 10.0 (5.6, 14.5)*** 69.4 (34.9, 103.9)*** 4.1 (5.6, 14.5) 0.0 (-0.1, 0.2) 0.1 (-0.1, 0.2) 0.0 (-5.3, 5.6)
MolYsis -50.6 (-72.0, -29.3)*** 0.2 (-0.2, 0.7) 6.2 (1.8, 10.6)* 56.2 (21.7, 90.7)** -0.2 (1.8, 10.6) 0.2 (0.1, 0.3)** 0.2 (0.1, 0.3) 2.3 (-3.0, 7.6)
QIAamp -75.4 (-96.9, -54.0)*** 1.1 (0.6, 1.5)*** 7.8 (3.3, 12.2)** 69.4 (34.9, 103.9)*** 4.1 (3.3, 12.2) 0.1 (0.0, 0.3) 0.4 (0.0, 0.3)** 0.1 (-5.4, 5.5)
Sputum lyPMA -3.8 (-15.4, 7.8) 0.5 (0.3, 0.8)** 37.6 (19.5, 55.7)** 86.0 (19.6, 152.4)* 4.2 (19.5, 55.7) 0.3 (0.1, 0.6)** 0.2 (0.1, 0.6) -40.9 (-52.6, -29.1)***
Benzonase -6.3 (-17.9, 5.4) 0.8 (0.6, 1.1)*** 66.6 (48.5, 84.7)*** 91.6 (25.2, 158.0)** 19.2 (48.5, 84.7) 0.5 (0.3, 0.7)*** 0.3 (0.3, 0.7) -52.5 (-64.2, -40.7)***
HostZERO -45.5 (-57.1, -33.8)*** 1.7 (1.4, 1.9)*** 103.0 (84.9, 121.1)*** 145.8 (79.4, 212.2)*** 91.8 (84.9, 121.1)*** 0.6 (0.4, 0.8)*** 0.7 (0.4, 0.8)** -59.9 (-71.6, -48.1)***
MolYsis -69.6 (-81.3, -58.0)*** 2.0 (1.7, 2.2)*** 112.8 (94.7, 130.9)*** 150.4 (84.0, 216.8)*** 118.4 (94.7, 130.9)*** 0.6 (0.4, 0.8)*** 0.8 (0.4, 0.8)*** -59.9 (-71.6, -48.1)***
QIAamp -18.7 (-30.3, -7.1)** 1.4 (1.2, 1.7)*** 85.2 (67.1, 103.3)*** 123.6 (57.2, 190.0)*** 46.6 (67.1, 103.3)* 0.6 (0.4, 0.8)*** 0.7 (0.4, 0.8)** -60.6 (-72.3, -48.8)***
a Unable to run statistical tests

Done.

Bibliography

#===============================================================================
#BTC.LineZero.Footer.1.1.0
#===============================================================================
#R markdown citation generator.
#===============================================================================
#RLB.Dependencies:
#   magrittr, pacman, stringr
#=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
#BTC.Dependencies:
#   LineZero.Header
#===============================================================================
#Generates citations for each explicitly loaded library.
#=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
str_libraries <- c("r", str_libraries)
for (str_libraries in str_libraries) {
    str_libraries |>
        pacman::p_citation() |>
        print(bibtex = FALSE) |>
        capture.output() %>%
        .[-1:-3] %>% .[. != ""] |>
        stringr::str_squish() |>
        stringr::str_replace("_", "") |>
        cat()
    cat("\n")
}
## Computing. R Foundation for Statistical Computing, Vienna, Austria. <https://www.R-project.org/>. We have invested a lot of time and effort in creating R, please cite it when using it for data analysis. See also 'citation("pkgname")' for citing R packages.
## version 1.4.3, <https://CRAN.R-project.org/package=readxl>.
## graphics of microbiome census data. Paul J. McMurdie and Susan Holmes (2013) PLoS ONE 8(4):e61217.
## Grolemund G, Hayes A, Henry L, Hester J, Kuhn M, Pedersen TL, Miller E, Bache SM, Müller K, Ooms J, Robinson D, Seidel DP, Spinu V, Takahashi K, Vaughan D, Wilke C, Woo K, Yutani H (2019). "Welcome to the tidyverse." Journal of Open Source Software_, *4*(43), 1686. doi:10.21105/joss.01686 <https://doi.org/10.21105/joss.01686>.
## R. version 0.5.0. Buffalo, New York. http://github.com/trinker/pacman
## J, reikoch, Beasley W, O'Connor B, Warnes GR, Quinn M, Kamvar ZN, Gao C (2024). yaml: Methods to Convert R Data to YAML and Back_. R package version 2.3.10, <https://CRAN.R-project.org/package=yaml>. ATTENTION: This citation information has been auto-generated from the package DESCRIPTION file and may need manual editing, see 'help("citation")'.
## Springer-Verlag New York, 2016.
## O'Hara R, Solymos P, Stevens M, Szoecs E, Wagner H, Barbour M, Bedward M, Bolker B, Borcard D, Carvalho G, Chirico M, De Caceres M, Durand S, Evangelista H, FitzJohn R, Friendly M, Furneaux B, Hannigan G, Hill M, Lahti L, McGlinn D, Ouellette M, Ribeiro Cunha E, Smith T, Stier A, Ter Braak C, Weedon J (2024). vegan: Community Ecology Package. R package version 2.6-8, <https://CRAN.R-project.org/package=vegan>.
## http://microbiome.github.io
## Plots. R package version 0.6.0, <https://CRAN.R-project.org/package=ggpubr>.
## Sciaini, and Cédric Scherer (2024). viridis(Lite) - Colorblind-Friendly Color Maps for R. viridis package version 0.6.5.
## "Simple statistical identification and removal of contaminant sequences in marker-gene and metagenomics data." bioRxiv_, 221499. doi:10.1101/221499 <https://doi.org/10.1101/221499>.
## Graphics. R package version 2.3, <https://CRAN.R-project.org/package=gridExtra>.
## Plots. R package version 0.6.0, <https://CRAN.R-project.org/package=ggpubr>.
## Fitting Linear Mixed-Effects Models Using lme4. Journal of Statistical Software, 67(1), 1-48. doi:10.18637/jss.v067.i01.
## Package: Tests in Linear Mixed Effects Models." Journal of Statistical Software, *82*(13), 1-26. doi:10.18637/jss.v082.i13 <https://doi.org/10.18637/jss.v082.i13>.
## R package version 1.5.1, <https://CRAN.R-project.org/package=writexl>.
## Matrices and Other Utilities. R package version 0.2.3, <https://CRAN.R-project.org/package=harrietr>.
## Population-scale Meta-omics Studies, http://huttenhower.sph.harvard.edu/maaslin2. To cite the MaAsLin 2 software, please use: Mallick H, Rahnavard A, McIver LJ (2020). MaAsLin 2: Multivariable Association in Population-scale Meta-omics Studies. R/Bioconductor package, http://huttenhower.sph.harvard.edu/maaslin2.
## for 'ggplot2'. R package version 0.1.2, <https://CRAN.R-project.org/package=ggtext>.
## package version 0.6.0, <https://CRAN.R-project.org/package=ggpmisc>.
## using 'mgcv' and 'lme4'. R package version 0.2-6, <https://CRAN.R-project.org/package=gamm4>. ATTENTION: This citation information has been auto-generated from the package DESCRIPTION file and may need manual editing, see 'help("citation")'.
## Journal of Statistical Software, 21(12), 1-20. URL http://www.jstatsoft.org/v21/i12/.
## Pipe Syntax. R package version 1.4.0, <https://CRAN.R-project.org/package=kableExtra>.
## Generation in R. R package version 1.48, <https://yihui.org/knitr/>. Yihui Xie (2015) Dynamic Documents with R and knitr. 2nd edition. Chapman and Hall/CRC. ISBN 978-1498716963 Yihui Xie (2014) knitr: A Comprehensive Tool for Reproducible Research in R. In Victoria Stodden, Friedrich Leisch and Roger D. Peng, editors, Implementing Reproducible Computational Research. Chapman and Hall/CRC. ISBN 978-1466561595
## Visualization of Phylogenetic Trees (1st edition). Chapman and Hall/CRC. doi:10.1201/9781003279242 Shuangbin Xu, Lin Li, Xiao Luo, Meijun Chen, Wenli Tang, Li Zhan, Zehan Dai, Tommy T. Lam, Yi Guan, Guangchuang Yu. Ggtree: A serialized data object for visualization of a phylogenetic tree and annotation data. iMeta 2022, 4(1):e56. doi:10.1002/imt2.56 Guangchuang Yu. Using ggtree to visualize data on tree-like structures. Current Protocols in Bioinformatics, 2020, 69:e96. doi: 10.1002/cpbi.96 Guangchuang Yu, Tommy Tsan-Yuk Lam, Huachen Zhu, Yi Guan. Two methods for mapping and visualizing associated data on phylogeny using ggtree. Molecular Biology and Evolution 2018, 35(2):3041-3043. doi: 10.1093/molbev/msy194 Guangchuang Yu, David Smith, Huachen Zhu, Yi Guan, Tommy Tsan-Yuk Lam. ggtree: an R package for visualization and annotation of phylogenetic trees with their covariates and other associated data. Methods in Ecology and Evolution 2017, 8(1):28-36. doi:10.1111/2041-210X.12628
## Third edition. Sage, Thousand Oaks CA. <https://www.john-fox.ca/Companion/>.
## Imai (2014). mediation: R Package for Causal Mediation Analysis. Journal of Statistical Software, 59(5), 1-38. URL http://www.jstatsoft.org/v59/i05/. For the underlying methods please cite the following papers: Kosuke Imai, Luke Keele and Teppei Yamamoto (2010). Identification, Inference and Sensitivity Analysis for Causal Mediation Effects. Statistical Science, 25(1), 51-71. Kosuke Imai, Luke Keele and Dustin Tingley (2010). A General Approach to Causal Mediation Analysis. Psychological Methods, 15(4), 309-334. Kosuke Imai, Luke Keele, Dustin Tingley and Teppei Yamamoto (2011). Unpacking the Black Box of Causality: Learning about Causal Mechanisms from Experimental and Observational Studies. American Political Science Review, 105(4), 765-789. Kosuke Imai and Teppei Yamamoto (2013). Identification and Sensitivity Analysis for Multiple Causal Mechanisms: Revisiting Evidence from Framing Experiments. Political Analysis, 21(2), 141-171. Kosuke Imai, Luke Keele, Dustin Tingley and Teppei Yamamoto (2010). Causal Mediation Analysis Using R. In Advances in Social Science Research Using R, ed. H. D. Vinod, New York: Springer-Verlag.
## package version 0.4.9, <https://CRAN.R-project.org/package=lemon>.
## estimation for false discovery rate control. doi:10.18129/B9.bioc.qvalue <https://doi.org/10.18129/B9.bioc.qvalue>, R package version 2.36.0, <https://bioconductor.org/packages/qvalue>.
#===============================================================================